thr3ads.net - R help - [R] SVM. How to use categorical attributes? [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Alekseiy Beloshitskiy

2012-Mar-27 10:05 UTC

[R] SVM. How to use categorical attributes?

Hi All,

Here is the case. I want to build classification model (SVM). Some of variables
for this model are categorical attributes which represent words  (usually 3-10
words - query for search in google). For example:
search_id | query_words                        |..| result
-----------+----------------------------------+--+--------
1            | how,to,grow,tree                  |..| 4
2            | smartfone,htc,buy,price         |..| 7
3            | buy,house,realty,london         |..| 6
4            | where,to,go,weekend,cinema |..| 4
...
As you can see, words in the query are disordered and may occur in different
queries. Total number of unique words for all queries is several thousands.
The question is how to represent this variable (query_words) to use for SVM.

Thank you for any advices!

Alex

	[[alternative HTML version deleted]]

Steve Lianoglou

2012-Mar-27 18:47 UTC

head link

[R] SVM. How to use categorical attributes?

Hi,

On Tue, Mar 27, 2012 at 6:05 AM, Alekseiy Beloshitskiy
<abeloshitskiy at velti.com> wrote:> Hi All,
>
> Here is the case. I want to build classification model (SVM). Some of
variables for this model are categorical attributes which represent words
?(usually 3-10 words - query for search in google). For example:
> search_id | query_words ? ? ? ? ? ? ? ? ? ? ? ?|..| result
> -----------+----------------------------------+--+--------
> 1 ? ? ? ? ? ?| how,to,grow,tree ? ? ? ? ? ? ? ? ?|..| 4
> 2 ? ? ? ? ? ?| smartfone,htc,buy,price ? ? ? ? |..| 7
> 3 ? ? ? ? ? ?| buy,house,realty,london ? ? ? ? |..| 6
> 4 ? ? ? ? ? ?| where,to,go,weekend,cinema |..| 4
> ...
> As you can see, words in the query are disordered and may occur in
different queries. Total number of unique words for all queries is several
thousands.
> The question is how to represent this variable (query_words) to use for
SVM.
>
> Thank you for any advices!
One approach is to wire up a "bag of words" type of design matrix.

That is to say the matrix has as many columns as there are unique
words. Each row is an observation (query), and the words that appear
in the query have a value of 1 (or you can count the number of times
each word appears).

You can maybe get smarter and try to group like words together, but
... now you'll have two problems ...

Hope you have lots of data!

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

Alekseiy Beloshitskiy

2012-Mar-28 12:05 UTC

head link

[R] SVM. How to use categorical attributes?

Thank you so much, Ulrich,

Will play with this. 

Best,
-Alex
________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on
behalf of Ulrich Bodenhofer [bodenhofer at bioinf.jku.at]
Sent: 28 March 2012 14:40
To: r-help at r-project.org
Subject: Re: [R] SVM. How to use categorical attributes?

Sorry, I forgot to mention the following: all I wrote is only valid as long
as your number of samples is smaller than the number of different words. If
the number of samples exceeds the total number of different words, you
should better use the explicit matrix representation and use some kernel
(e.g. linear) on this matrix.

Best regards,
Ulrich


--
View this message in context:
http://r.789695.n4.nabble.com/SVM-How-to-use-categorical-attributes-tp4508460p4512041.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Mar 2012 - SVM. How to use categorical attributes?

[R] SVM. How to use categorical attributes?

[R] SVM. How to use categorical attributes?

[R] SVM. How to use categorical attributes?

Possibly Parallel Threads