similar to: How to make xapian run in hadoop

Displaying 20 results from an estimated 10000 matches similar to: "How to make xapian run in hadoop"

2019 Nov 22
0
How to make xapian run in hadoop
On Thu, Nov 21, 2019 at 10:20:19AM +0800, ??? wrote: > We use xapian as the backend of our system. Now the data need be > indexed ever-increasing, and the local mode is hard to maintain, so we > plan to move the index builder to hadoop. We try to make xapian can be > run in hadoop, and now met a problem that there are many seek > operations when xapian writes the index files, but
2023 Mar 08
1
Release plans
Olly Betts <olly at survex.com> wrote: > The current plan for the next release series includes relicensing > the C++ libxapian library in xapian-core as MPL. The remaining > blockers for this are: > > * adding update support to the new honey backend (to replace glass) Just wondering if there's docs on what improvements users can expect from honey. Mainly smaller size?
2023 Mar 06
2
Release plans
The current plan for the next release series includes relicensing the C++ libxapian library in xapian-core as MPL. The remaining blockers for this are: * adding update support to the new honey backend (to replace glass) * adding support for RAM storage to honey (to replace inmemory) * moving some remote client and server code out of libxapian (or replacing it) I'm certainly still aiming
2018 Mar 30
2
sorting large msets
Hello, is there a way to optimize sorting by certain values for queries which return a huge amount of results? For example, I just want a simple query that gives me the 200 most recent emails out of millions. The elapsed time for get_mset increases as the number of documents ($n * 2000) increases. I suppose I could store a pre-sorted set using SQLite or similar. Thanks in advance for any
2018 Mar 20
2
how to build 64bit xapian using MSVC2017?
On Tue, Mar 20, 2018 at 06:30:07PM +0000, Olly Betts wrote: > https://lists.xapian.org/pipermail/xapian-discuss/2018-January/009585.html Related to this, the appveyor build is currently failing on git master. Unfortunately the change at which is started to fail was the addition of the new "honey" backend, which doesn't narrow things down to a useful degree. I've checked over
2008 Aug 21
2
Large data sets with R (binding to hadoop available?)
Dear R community, I find R fantastic and use R whenever I can for my data analytic needs. Certain data sets, however, are so large that other tools seem to be needed to pre-process data such that it can be brought into R for further analysis. Questions I have for the many expert contributors on this list are: 1. How do others handle situations of large data sets (gigabytes, terabytes)
2009 Jul 31
1
Using R with Hadoop/Hive for Big Data
Hive <http://hadoop.apache.org/hive/> is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called QL which is based on SQL and which enables users familiar with
2013 May 20
1
Glusterfs-Hadoop
Hi, Where can I find glusterfs-hadoop-0.20.2-0.1.x86_64.rpm? The following link is from the Gluster FS Admin Guide, but it doesn't exist: http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/glusterfs-hadoop-0.20.2-0.1.x86_64.rpm Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL:
2012 Nov 07
2
R + Hadoop on Amazon
Hello All, Having some issue with local machine, I need to locate myself on Amazon for running R and Hadoop with Amazon instance. After searching a lot, I can't able to take a decision for choosing Image for Amazon instance. Can any one using R + Hadoop on Amazon. Thanks [[alternative HTML version deleted]]
2016 Jul 06
2
Xapian 1.4.0 released
I have installed the new Xapian 1.4.0 , during the installation, I haven't seen any problems, however, when I execute commands quest and delve I get different versions, and my Perl-based searches return Exception: Couldn't detect type of database ... and what are these glass things in the index directories? There is a no new version of Perl Search::Xapian. $ quest -version quest -
2015 Dec 09
2
SVM hadoop
Buenos días, alguien sabe si hay alguna manera de implementar una máquina de soporte vectorial (svm) con R-hadoop?? Mi interés es hacer procesamiento big data con svm. Se que en R, existen los paquetes {RtextTools} y {e1071} que permiten hacer svm. Pero no estoy segura de que el algoritmo sea paralelizable, es decir, que pueda correr en paralelo a través de la plataforma R-hadoop. Muchas
2018 Mar 07
2
Xapian 1.4.5 "Db block overwritten - are there multiple writers?" with Glass
On Mon, Mar 05, 2018 at 09:48:52PM +0000, Olly Betts wrote: > On Mon, Mar 05, 2018 at 08:52:47PM +0100, Sylvain Taverne wrote: > > I've remarked the error occur when i'm trying to get stored values from a > > database with a lot of stored values. I can reproduce the error with simple > > python2 script i've posted on github > > > >
2018 Jul 10
2
Xapian 1.4.5 "Db block overwritten - are there multiple writers?" with Glass
On Mon, Jul 09, 2018 at 10:29:18AM +0100, Olly Betts wrote: > The attached patch reset this cursor each time commit() is called, and > that fixes my C++ reproducer, though I think this ought to work as-is > and the real bug is at a lower level. I've dug deeper and that was indeed the case. Here's a patch which addresses the root cause:
2016 Jun 25
2
Xapian 1.4.0 released
I'm delighted to announce the release of 1.4.0. You can download from: http://xapian.org/download This is a major milestone release, but the last development release (1.3.7) was essentially a release candidate so the changes arefairly minor - the only notable change is the update to Unicode 9.0.0. That means a short thank you list for this release - thanks to Andy Chilton! As always, if
2017 Dec 05
1
How to enhance the query performance for large boolean attribute
Hi all, I am a new user to Xapian, and now we met such problem. In our case, a document has many attributes which is boolean value, for example( A, B, C ) , and our search query will use certain filter logic ( A == true and B == false ..) to combine with other search logic. We use MatchDecider to implement the filter logic, and now we met some performance problem, because our self-defined
2015 Dec 10
2
SVM hadoop
Hola, Puedes poner un RStudio en Amazon, poner "caret" y a correr.... No sé si tendrás suficiente con lo que te pueda ofrecer Amazon para tu problema... creo que sí... ;-).... O directamente hacerlo aquí, que toda esta instalación ya la tienen hecha: http://www.teraproc.com/front-page-posts/r-on-demand/ Gracias, Carlos. El 10 de diciembre de 2015, 14:43, MªLuz Morales <mlzmrls
2015 Dec 10
3
SVM hadoop
Estimados Un día leí algo en el siguiente hipervínculo, pero nunca lo use. http://blog.revolutionanalytics.com/2015/06/using-hadoop-with-r-it-depends.html Javier Rubén Marcuzzi De: Carlos J. Gil Bellosta Enviado: miércoles, 9 de diciembre de 2015 14:33 Para: MªLuz Morales CC: r-help-es Asunto: Re: [R-es] SVM hadoop No, no correrán en paralelo si usas los SVM de paquetes como e1071. No
2015 Dec 11
2
SVM hadoop
Hola Mª Luz, Te cuento un poco mi visión: Lo primero de todo es tener claro qué quiero hacer exactamente en paralelo, se me ocurren 3 escenarios: (1) Aplicar un modelo en este caso SVM sobre unos datos muy grandes y por eso necesito hadoop/spark (2) Realizar muchos modelos SVM sobre datos pequeños (por ejemplo uno por usuario) y por eso necesito hadoop/spark para parelilizar estos procesos
2020 Aug 21
2
MultiDatabase shard count limitations
Going back to the "prioritizing aggregated DBs" thread from February 2020, I've got 390 Xapian shards for 130 public inboxes I want to search against(*). There's more on the horizon (we're expecting tens of thousands of public inboxes). After bumping RLIMIT_NOFILE and running ->add_database a bunch, the actual queries seem to be taking ~30s (not good :x). Now I'm
2010 Dec 24
1
Running scripts in hadoop
R-help group, I'm looking for some assistance on using an R-script to read STDIN from hadoop. Example, say I have two tables. One is a student table, the other is a class roster table (tables join on student_id). Student SAT score is in the student table, whether the student passed or not is in the roster table. So to determine if a student passed or failed based on their SAT score, I'd