similar to: Help-Multi class classification for large datasets

Displaying 20 results from an estimated 5000 matches similar to: "Help-Multi class classification for large datasets"

2009 Jun 19
3
please recommend hands-on books on classification, data-mining and machine learning with R?
Hi all, Could anybody please recommend some hands-on books on classification, data-mining and machine learning with R? I would like to get a very good understanding of the statistical tools that are used in these areas, while reducing the learning curve. Thank you!
2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little script to output the terms with the highest frequency in my
2010 Oct 08
1
Get a list of all terms in an indexed corpus
Hello, I have a corpus that I have indexed with xapian/xappy and I would now like to generate a corpus-specific list of stopwords. (This is a technical corpus, so a typical stopword list wouldn't be helpful.) My first thought was to ask the xapian database for a list of terms followed by their frequency. My intuition is that I could probably bring together a list of stopwords by examining
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a
2008 Mar 12
1
how can i use stopwords?
Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );
2010 Nov 15
4
Stopword addition and stemming
Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it
2011 Feb 11
4
About classification methods.
Dear R users, I'm new of the R, I really don't know much. I want classification some data (two class, many features and huge size of data) by using R. At this case, I want using Support Vector Machine, Bayes theory based classifier, Discriminant Analysis, Regression based at least. Which package should I using, and can I compare each classifier result by predictions? Thank you.
2017 Jun 13
2
Classification and Regression Tree for Survival Analysis
I am trying to use the CART in a survival analysis. I have three variables of interest (all 3 ordinal - x, y and z, each of them with 5 categories) from which I want to make smaller groups (just an example 1st category from X variable with the 2nd and 3rd categories from the Y category and 2, 3 and 4 categories from the Z category etc) based on their, let's say, association with mortality. Now
2013 Mar 29
3
Installing views in R2.15.3
Hi, Please what am I doing wrong? I tried installing some views (eg MachineLearning, Multivariate) on R2.15.3, but it keeps on telling me that the package is not available for 2.15.3. Is it true? Thanks [[alternative HTML version deleted]]
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
Hi, I bumped into a serious issue while trying to analyse some texts in Bulgarian language (with the tm package). I import a tab-separated csv file, which holds a total of 22 variables, most of which are text cells (not factors), using the read.delim function: data<-read.delim("bigcompanies_ascii.csv", header=TRUE, quote="'",
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
Hi, I bumped into a serious issue while trying to analyse some texts in Bulgarian language (with the tm package). I import a tab-separated csv file, which holds a total of 22 variables, most of which are text cells (not factors), using the read.delim function: data<-read.delim("bigcompanies_ascii.csv", header=TRUE, quote="'",
2002 Apr 16
6
Classification Analysis
Hi everyone, Could somebody explain to me what is the package/function for classification analysis. I am performing analysis of music files in the form of MIDI files. I end up with about 750 dependent variables from the analysis, I also have a number of independent/grouping variables that I set manually. What I would like is to be able to predict which group a particular MIDI files belongs to
2005 Jul 19
2
data mining
Dear all, I'm looking for some material on data mining with R. I have something from Luis Torgo but I'd like to see something else. If anybody could help me I'll be thankful Adri??n
2009 Jul 17
3
Ayuda con el paquete de text mining (TM)
Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones
2007 Jan 19
9
Double-quoted query with "and" fails.
Hi, We''re using Ferret 0.9.4 and we''ve observed the following behavior. Searching for ''fieldname: foo and bar'' works fine while ''fieldname: "foo and bar"'' doesn''t return any results. Is there a way to make ferret recognize the ''and'' inside the query as a search term and not an operator? (I hope I got the
2004 Dec 13
2
classification for huge datasets: SVM yields memory troubles
Hi I have a matrix with 30 observations and roughly 30000 variables, each obs belongs to one of two groups. With svm and slda I get into memory troubles ('cannot allocate vector of size' roughly 2G). PCA LDA runs fine. Are there any way to use the memory issue withe SVM's? Or can you recommend any other classification method for such huge datasets? P.S. I run suse 9.1 on a 2G RAM
2017 Oct 02
2
R and Supervised learning
Hi, I am currently find myself selecting manually amoungts several hundreds Google Alerts (GA) texts those that are indeed relevant for my research vs those which are not (despite they are triggered by some relevant seach keywords). Basically each week I get several hundreds GA email such as:
2007 Aug 16
1
Regression tree: labels in the terminal nodes
Dear everybody, I'm a new user of R 2.4.1 and I'm searching for information on improving the output of regression tree graphs. In the terminal nodes I am up to now able to indicate the number of values (n) and the mean of all values in this terminal node by the command > text(tree, use.n=T, xpd=T) Yet I would like to indicate automatically in the output graph of the tree some
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
Hola a todos! Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes: # carga librerias library(tm) library(wordcloud) library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <-
2007 May 09
3
bug when assigning new analyzer?
require ''rubygems'' require ''ferret'' include Ferret PATH = ''/tmp/ferret_stopwords_test'' index = Index::IndexWriter.new(:path => PATH, :create => true) index.analyzer = Analysis::StandardAnalyzer.new([]) index << {:title => ''a few good men'', :language => ''en''} index.analyzer =