thr3ads.net - similar to: "Help-Multi class classification for large datasets"

Displaying 20 results from an estimated 5000 matches similar to: "Help-Multi class classification for large datasets"

please recommend hands-on books on classification, data-mining and machine learning with R?

2009 Jun 19

please recommend hands-on books on classification, data-mining and machine learning with R?

Hi all, Could anybody please recommend some hands-on books on classification, data-mining and machine learning with R? I would like to get a very good understanding of the statistical tools that are used in these areas, while reducing the learning curve. Thank you!

ideas on picking stopwords

2009 Mar 26

ideas on picking stopwords

I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little script to output the terms with the highest frequency in my

Get a list of all terms in an indexed corpus

2010 Oct 08

Get a list of all terms in an indexed corpus

Hello, I have a corpus that I have indexed with xapian/xappy and I would now like to generate a corpus-specific list of stopwords. (This is a technical corpus, so a typical stopword list wouldn't be helpful.) My first thought was to ask the xapian database for a list of terms followed by their frequency. My intuition is that I could probably bring together a list of stopwords by examining

KMeans Clusterer - Going forward

2017 Jun 14

KMeans Clusterer - Going forward

Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a

how can i use stopwords?

2008 Mar 12

how can i use stopwords?

Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );

Stopword addition and stemming

2010 Nov 15

Stopword addition and stemming

Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it

About classification methods.

2011 Feb 11

About classification methods.

Dear R users, I'm new of the R, I really don't know much. I want classification some data (two class, many features and huge size of data) by using R. At this case, I want using Support Vector Machine, Bayes theory based classifier, Discriminant Analysis, Regression based at least. Which package should I using, and can I compare each classifier result by predictions? Thank you.

Classification and Regression Tree for Survival Analysis

2017 Jun 13

Classification and Regression Tree for Survival Analysis

I am trying to use the CART in a survival analysis. I have three variables of interest (all 3 ordinal - x, y and z, each of them with 5 categories) from which I want to make smaller groups (just an example 1st category from X variable with the 2nd and 3rd categories from the Y category and 2, 3 and 4 categories from the Z category etc) based on their, let's say, association with mortality. Now

Installing views in R2.15.3

2013 Mar 29

Installing views in R2.15.3

Hi, Please what am I doing wrong? I tried installing some views (eg MachineLearning, Multivariate) on R2.15.3, but it keeps on telling me that the package is not available for 2.15.3. Is it true? Thanks [[alternative HTML version deleted]]

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

Hi, I bumped into a serious issue while trying to analyse some texts in Bulgarian language (with the tm package). I import a tab-separated csv file, which holds a total of 22 variables, most of which are text cells (not factors), using the read.delim function: data<-read.delim("bigcompanies_ascii.csv", header=TRUE, quote="'",

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

Classification Analysis

2002 Apr 16

Classification Analysis

Hi everyone, Could somebody explain to me what is the package/function for classification analysis. I am performing analysis of music files in the form of MIDI files. I end up with about 750 dependent variables from the analysis, I also have a number of independent/grouping variables that I set manually. What I would like is to be able to predict which group a particular MIDI files belongs to

data mining

2005 Jul 19

data mining

Dear all, I'm looking for some material on data mining with R. I have something from Luis Torgo but I'd like to see something else. If anybody could help me I'll be thankful Adri??n

Ayuda con el paquete de text mining (TM)

2009 Jul 17

Ayuda con el paquete de text mining (TM)

Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones

Double-quoted query with "and" fails.

2007 Jan 19

Double-quoted query with "and" fails.

Hi, We''re using Ferret 0.9.4 and we''ve observed the following behavior. Searching for ''fieldname: foo and bar'' works fine while ''fieldname: "foo and bar"'' doesn''t return any results. Is there a way to make ferret recognize the ''and'' inside the query as a search term and not an operator? (I hope I got the

classification for huge datasets: SVM yields memory troubles

2004 Dec 13

classification for huge datasets: SVM yields memory troubles

Hi I have a matrix with 30 observations and roughly 30000 variables, each obs belongs to one of two groups. With svm and slda I get into memory troubles ('cannot allocate vector of size' roughly 2G). PCA LDA runs fine. Are there any way to use the memory issue withe SVM's? Or can you recommend any other classification method for such huge datasets? P.S. I run suse 9.1 on a 2G RAM

R and Supervised learning

2017 Oct 02

R and Supervised learning

Hi, I am currently find myself selecting manually amoungts several hundreds Google Alerts (GA) texts those that are indeed relevant for my research vs those which are not (despite they are triggered by some relevant seach keywords). Basically each week I get several hundreds GA email such as:

Regression tree: labels in the terminal nodes

2007 Aug 16

Regression tree: labels in the terminal nodes

Dear everybody, I'm a new user of R 2.4.1 and I'm searching for information on improving the output of regression tree graphs. In the terminal nodes I am up to now able to indicate the number of values (n) and the mean of all values in this terminal node by the command > text(tree, use.n=T, xpd=T) Yet I would like to indicate automatically in the output graph of the tree some

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

Hola a todos! Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes: # carga librerias library(tm) library(wordcloud) library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <-

bug when assigning new analyzer?

2007 May 09

bug when assigning new analyzer?

require ''rubygems'' require ''ferret'' include Ferret PATH = ''/tmp/ferret_stopwords_test'' index = Index::IndexWriter.new(:path => PATH, :create => true) index.analyzer = Analysis::StandardAnalyzer.new([]) index << {:title => ''a few good men'', :language => ''en''} index.analyzer =

similar to: Help-Multi class classification for large datasets