Displaying 20 results from an estimated 5000 matches similar to: "Help-Multi class classification for large datasets"
2009 Jun 19
3
please recommend hands-on books on classification, data-mining and machine learning with R?
Hi all,
Could anybody please recommend some hands-on books on classification,
data-mining and machine learning with R? I would like to get a very
good understanding of the statistical tools that are used in these
areas, while reducing the learning curve.
Thank you!
2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was
wondering if anyone had any good rules of thumb on how to pick which
words to blacklist. It all seems a little... well... vague. Although I
guess it kind of depends on the sort of documents you're wanting to index.
My current idea is to write a little script to output the terms with the
highest frequency in my
2010 Oct 08
1
Get a list of all terms in an indexed corpus
Hello,
I have a corpus that I have indexed with xapian/xappy and I would now
like to generate a corpus-specific list of stopwords. (This is a
technical corpus, so a typical stopword list wouldn't be helpful.)
My first thought was to ask the xapian database for a list of terms
followed by their frequency. My intuition is that I could probably bring
together a list of stopwords by examining
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello,
I have finished moving the API to PIMPL classes and will fix issues within
the current code over the next week, based on reviews from mentors.
The next step going forward is to start with forming document vectors that
are reduced and more useful. This majorly helps in saving run time (since
time for distance calculation depends on number of terms). Getting the
useful terms within a
2008 Mar 12
1
how can i use stopwords?
Hi,
I do not understand the stopword function...
I've set the termgenerator like this:
$self->{'Stemmer'} = new Search::Xapian::Stem(german2);
$self->{'Stopper'} = new Search::Xapian::SimpleStopper();
$self->{'TermGenerator'} = new Search::Xapian::TermGenerator;
$self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );
2010 Nov 15
4
Stopword addition and stemming
Hi,
Two questions which I'm unsure about:
Stemming: I've turned on stemming, etc, but how can I confirm that
it's being used in searches? What should I look/search for?
Stopwords: I'm trying out xapian on a regional dataset (searching
data from a *.co.us TLD, eg) . I've noticed that searching for [bob
co.us] results in *very* slow search times (tens of seconds), since it
2011 Feb 11
4
About classification methods.
Dear R users,
I'm new of the R, I really don't know much.
I want classification some data (two class, many features and huge size of data) by using R.
At this case, I want using Support Vector Machine, Bayes theory based classifier, Discriminant Analysis, Regression based at least.
Which package should I using, and can I compare each classifier result by predictions?
Thank you.
2017 Jun 13
2
Classification and Regression Tree for Survival Analysis
I am trying to use the CART in a survival analysis. I have three variables of interest (all 3 ordinal - x, y and z, each of them with 5 categories) from which I want to make smaller groups (just an example 1st category from X variable with the 2nd and 3rd categories from the Y category and 2, 3 and 4 categories from the Z category etc) based on their, let's say, association with mortality.
Now
2013 Mar 29
3
Installing views in R2.15.3
Hi,
Please what am I doing wrong? I tried installing some views (eg MachineLearning, Multivariate) on R2.15.3, but it keeps on telling me that the package is not available for 2.15.3. Is it true?
Thanks
[[alternative HTML version deleted]]
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
Hi,
I bumped into a serious issue while trying to analyse some texts in
Bulgarian language (with the tm package). I import a tab-separated csv
file, which holds a total of 22 variables, most of which are text cells
(not factors), using the read.delim function:
data<-read.delim("bigcompanies_ascii.csv",
header=TRUE,
quote="'",
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
Hi,
I bumped into a serious issue while trying to analyse some texts in
Bulgarian language (with the tm package). I import a tab-separated csv
file, which holds a total of 22 variables, most of which are text cells
(not factors), using the read.delim function:
data<-read.delim("bigcompanies_ascii.csv",
header=TRUE,
quote="'",
2002 Apr 16
6
Classification Analysis
Hi everyone,
Could somebody explain to me what is the package/function for
classification analysis. I am performing analysis of music files in the form
of MIDI files. I end up with about 750 dependent variables from the
analysis, I also have a number of independent/grouping variables that I set
manually. What I would like is to be able to predict which group a
particular MIDI files belongs to
2005 Jul 19
2
data mining
Dear all,
I'm looking for some material on data mining with R. I have something
from Luis Torgo but I'd like to see something else.
If anybody could help me I'll be thankful
Adri??n
2009 Jul 17
3
Ayuda con el paquete de text mining (TM)
Estimados, les escribo para consultar, lo siguiente:
Estoy haciendo un trabajo de text mining y necesito importar una serie de
textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming,
eliminar signos de puntuación etc. Esto último lo puedo realizar con los
datasets que trae la librería TM. Lo que no puedo lograr es importar texto
desde algún medio a pesar que existe funciones
2007 Jan 19
9
Double-quoted query with "and" fails.
Hi,
We''re using Ferret 0.9.4 and we''ve observed the following behavior.
Searching for ''fieldname: foo and bar'' works fine while ''fieldname:
"foo and bar"'' doesn''t return any results. Is there a way to make
ferret recognize the ''and'' inside the query as a search term and not
an operator? (I hope I got the
2004 Dec 13
2
classification for huge datasets: SVM yields memory troubles
Hi
I have a matrix with 30 observations and roughly 30000 variables, each
obs belongs to one of two groups. With svm and slda I get into memory
troubles ('cannot allocate vector of size' roughly 2G). PCA LDA runs
fine. Are there any way to use the memory issue withe SVM's? Or can you
recommend any other classification method for such huge datasets?
P.S. I run suse 9.1 on a 2G RAM
2017 Oct 02
2
R and Supervised learning
Hi,
I am currently find myself selecting manually amoungts several hundreds
Google Alerts (GA) texts those that are indeed relevant for my research vs
those which are not (despite they are triggered by some relevant seach
keywords).
Basically each week I get several hundreds GA email such as:
2007 Aug 16
1
Regression tree: labels in the terminal nodes
Dear everybody,
I'm a new user of R 2.4.1 and I'm searching for information on improving
the output of regression tree graphs.
In the terminal nodes I am up to now able to indicate the number of
values (n) and the mean of all values in this terminal node by the command
> text(tree, use.n=T, xpd=T)
Yet I would like to indicate automatically in the output graph of the
tree some
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
Hola a todos!
Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes:
# carga librerias
library(tm)
library(wordcloud)
library(Rstem)
library(Snowball)
# lee el documento UTF-8 y lo convierte a ASCII
txt <-
2007 May 09
3
bug when assigning new analyzer?
require ''rubygems''
require ''ferret''
include Ferret
PATH = ''/tmp/ferret_stopwords_test''
index = Index::IndexWriter.new(:path => PATH, :create => true)
index.analyzer = Analysis::StandardAnalyzer.new([])
index << {:title => ''a few good men'', :language => ''en''}
index.analyzer =