thr3ads.net - similar to: "A query about clustering Idea"

Displaying 20 results from an estimated 6000 matches similar to: "A query about clustering Idea"

2016 Mar 20

A query about clustering Idea

Hello sir! Thank you for your gentle reply, I have visited to the github repository before but the code was not commented properly still I am trying to understand and figure out the redundant task in the algorithm by reading the code. For the same I asked you if there is any other document or information is available with you which can help me to understand it better would be great. It's okay

A query about clustering Idea

2016 Mar 22

A query about clustering Idea

You mean, to take review on proposal I need to create an application? and if it is the case then please tell me how to create application? On Tue, Mar 22, 2016 at 6:24 PM, James Aylett <james-xapian at tartarus.org> wrote: > On Tue, Mar 22, 2016 at 12:49:58PM +0530, MURTUZA BOHRA wrote: > > > I have prepared a draft proposal for clustering problem. If you have > >

A query about clustering Idea

2016 Mar 22

A query about clustering Idea

Sir, I have prepared a draft proposal for clustering problem. If you have previous year's accepted proposal please share it, that would be very helpful. And also I have attached my proposal, please review it. Thanks On Mon, Mar 21, 2016 at 7:10 PM, James Aylett <james-xapian at tartarus.org> wrote: > On Sun, Mar 20, 2016 at 11:39:37PM +0530, MURTUZA BOHRA wrote: > > >

clustering technique using lsi

2016 Mar 23

clustering technique using lsi

Hello sir, You have interpreted correctly that clustering will be done by generating the ring around the Document(i.e. the basic idea of LSI). But it is not like increasing the radius and the next shell will be another cluster, Rather it would pick one document (based on relevance score) and form a ring around it to cluster the document, then from the remaining documents(not in the cluster but

clustering technique using lsi

2016 Mar 22

clustering technique using lsi

I am still trying to find some faster clustering technique for search result. One technique which strike to me is, using the Latent Semantic Indexing for Clustering the search result can give better results. In which we don't even need to iterate over different values of 'k'(in K-means algorithm) to cluster documents rather we can cluster whole search result in one go. How Latent

Clustering Project

2016 Apr 07

Clustering Project

sir, Till now I was confident enough to handle this project in the summer but the only thing which I had a little concern was the algorithm that I proposed. The thing is previously while writing the proposal I did not read much research papers on clustering, just the few I read and an idea strike to me and I wrote the proposal and discussed with you. But for last few days I was searching for the

GSOC-2016 Project : Clustering of search results

2016 Mar 06

GSOC-2016 Project : Clustering of search results

On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >

Introduction and Doubts

2016 Mar 10

Introduction and Doubts

I was not sharing it on maling list because i thought that someone can use all ideas i proposed in their GSOC proposal. Surely i will contribute to xapian project. sorry if that was against the rules The algorithm is not developed by me but after having much research on various clustering techniques. I found that there is a new algorithm called CLUBS(Clustering Using Binary Splitting) which gives

Introduction and Doubts

2016 Mar 10

Introduction and Doubts

Tf-idf is most used used weighting scheme is easy to understand and has been used in other frameworks like lucene and many other places. okapi bm25(implemented in xapian) is theoretically better/improved measure than tf-idf and i am looking into various other weighting scheme which are there in xapian or can be implemented like TF-ICF(term frequecy inverse corpus frequency),TF-RF(term

GSoC 2016 - Introduction

2016 May 05

GSoC 2016 - Introduction

Hello, Thanks James for the reply. That cleared a few things out. Apologies for replying late because of exams going on. I was going through the previous clustering API to understand how it worked and it seems like the the approach for construction of the termlists which are used for distance metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need

GSOC-2016 Project : Clustering of search results

2016 Mar 05

GSOC-2016 Project : Clustering of search results

Hello devs, I am Richhiey Thomas, pursuing my third year of undergraduate studies in Computer Science from Mumbai University. I had gone through the project list for this year and the project idea based on clustering caught my attention. I spoke to Assem Chelli on IRC who guided me to the code and got me started. I started going through the code and have successfully built Xapian on my machine.

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

> How long does 200?300 documents take to cluster? How does it grow as more > documents are included in the MSet? We'd expect an MSet of 1000 documents > to take longer to cluster than one with 100, but the important thing is > _how_ the time increases as the number of documents grows. > > Currently, the number of seconds taken for clustering a set of documents for varying

Weighting Schemes: Evaluation results

2016 Jul 28

Weighting Schemes: Evaluation results

Ah. If FIRE doesn't have something that can show this suitably, then > maybe Parth can advise on access to TREC, as I know he's used some of > them in the past. > ?I can say FIRE is also a reliable source but INEX/TREC are better. INEX can give you free access and TREC is not freely available. I had used INEX for xapian in the past and some details are here:

Introduction and Doubts

2016 Mar 09

Introduction and Doubts

Hello All,I am Nirmal Singhania from NIIT University,India. I am interested in Clustering of search results Topic. I have been in field of practical machine learning and information retrieval from quite some time. I took various courses/MOOC on Information retrieval and Text Mining and have been working on real life datasets(KDD99,AWID,Movielens). Because the problems you face in real life ML/IR

GSoC 2016 - Introduction

2016 May 01

GSoC 2016 - Introduction

Before going ahead with the tests as you mentioned above, I would just like to clarify a few higher level things that I am still in doubt about. 1) As discussed during the IRC interview, I was suggested about first implementing a normal K-means clustering implementation and then adding on the PSO module as a functionality that can be used to improve quality of clustering for speed as a trade off.

Regarding GSoC 2016 project idea

2016 Mar 19

Regarding GSoC 2016 project idea

Hello, I am Ainish Dave from Ahmedabad, India. I am currently pursuing my masters level degree with background in computer science. I browsed through the list of organizations on the GSoC program page and Xapian was the one which closely matched with my interest and background. I had some queries: 1) I am interested in the project 'Clustering of Search Results'. I have followed the

Weighting Schemes: Evaluation results

2016 Aug 07

Weighting Schemes: Evaluation results

Hi, Evaluation of pivoted normalization ("PPP") of tf-idf weighting scheme is also complete now. I have also evaluated the default tf-idf normalization ("ntn") and other normalizations combinations involving pivoted normalization in wdfn, idfn and wtn component as "Pxx", "xPx" and "xxP" normalization strings respectively to have a clear idea about

K MEANS clustering

2016 Jul 26

K MEANS clustering

Hello, I've been working on the KMeans clustering algorithm recently and since the past week, I have been stuck on a problem which I'm not able to find a solution to. Since we are representing documents as Tf-idf vectors, they are really sparse vectors (a usual corpus can have around 5000 terms). So it gets really difficult to represent these sparse vectors in a way that would be

GSoC-2017 Introduction and Project Discussion

2017 Mar 16

GSoC-2017 Introduction and Project Discussion

Hello, I'm Shivang Bansal, a 3rd year Computer Science Engineering undergraduate at Institute of Engineering & Technology in Lucknow, India. This mail is an expression of my interest for Google Summer of Code program of this year. I want to apologize for getting in so late. Actually I would have contacted earlier, but sudden demise of my Grandfather disabled me in doing so. I am

r-base-dev not installing in Ubuntu 16.04

2018 Jun 13

r-base-dev not installing in Ubuntu 16.04

Hello All, When I try to install r-base-dev on my Ubuntu 16.04 it gives me the following error r-base-dev : Depends: dh-r but it is not installable E: Unable to correct problems, you have held broken packages. I added the following two repos deb https://mirrors.ebi.ac.uk/CRAN/bin/linux/ubuntu xenial-cran35 deb http://uk-mirrors.evowise.com/ubuntu/ bionic-backports main restricted universe but

similar to: A query about clustering Idea