thr3ads.net - R help - [R] How do you scale variables which consist of tokens [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Alekseiy Beloshitskiy

2012-Mar-23 14:34 UTC

[R] How do you scale variables which consist of tokens

Dear All,
Let's suppose there's a case when you want to make a prediction using
range of variables. Some variables are represented as set of words (tokens). For
example there is a training set:
x1,x2,..,x7, y
where y - to be predicted (despite of the model to be used for prediction), and
let's say:
x4 - variable which presented as words from google search query (number of words
may be different in each observation). For example:
x4=(how,grow,tree) and can be presented in hashed form:
x4=(11111,22222,33333)

I need to scale this variable (x4) to be able to use it in model. I was thinking
about scaling it with TF-IDF. In this way I can represent each observation of x4
as a scaled vector with N elements like:
x4=(0.0175105020782697,...0.019135397913606) //scaled with TF-IDF
However, it still isn't scaled properly (please correct me if I'm wrong)
since I need x4 to be presented as INTEGRAL value for each observation to be
able to use it in model. I assume the result of scaling should look like:
x4=0.06789324432 //integral value

Do you have any ideas how to do this?

Appreciate for any ideas.


-Aleksei

	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Mar 2012 - How do you scale variables which consist of tokens

[R] How do you scale variables which consist of tokens

Apparently Analagous Threads

Wisdom of the Ancients