Thanks Oldrich and Max,
I have some more queries.
If I need to train svm() with only one instance I get the following error:
Error in if (any(co)) { : missing value where TRUE/FALSE needed
Will it be wiser if I duplicate the instance with minute changes in the values,
or there is some other way to overcome this trouble.
Second is, if I remove the similar columns from the training dataset I would
also have to remove the same columns from the test dataset, right?
Regards,
Soumyadeep
Oldrich Kruza <sixtease@gmail.com> wrote: Hello Soumyadeep,
Principal Component Analysis tells you which linear combinations of
your features are most relevant. So it's not really feature selection.
If you want to use PCA, then you have to transform your data so that
in each column, there's the linear combination that PCA chose. I think
you have to do that yourself and I have no experience with using PCA
myself.
I have no experience with or knowledge about Singular Value
Decomposition whatsoever, so I'm afraid I can't provide any insight
into that.
~ Oldrich
On Fri, Mar 7, 2008 at 9:48 AM, Soumyadeep nandi
wrote:> Great, I too had the same problem of large size data. But somehow I managed
> to reduced it to some manageable size. I did this before generating the
data
> for model building. I still wonder how to reduce matrix size be PCA. Anyway
> if required I would have to do that too. BTW, do you know any tutorial to
> reduce features by PCA or SVD. I find it difficult to work with matrix,
> because after running PCA on the matrix I want to get subset of my data as
a
> matrix which I can process further(like making model etc). What I get is
> some principle components. Anyway, lots of thanks for the help you have
> extended.
>
> Best regards,
>
> Soumyadeep Nandi
> Research Scholar
> Center for Computational Biology and Bioinformatics
> School of Information Technology
> Jawaharlal Nehru University
> New Delhi 110067
> India
>
> Oldrich Kruza wrote:
> Hello,
>
> I study computational linguistics in the Charles University in Prague,
> Czech Republic. Now I'm working on my master thesis during my
> 1-semester stay in the Saarland University, Germany.
>
> It's funny - I'm struggling with SVM's right now myself. My
data set
> has over 2 GB, I managed to reduce it to about 270 MB by feature
> selection and getting rid of labels and the like. Still, training the
> SVM crashes because of memory exhaustion even on a machine with 16GB
> of RAM. So that's why I had the memory in my head when replying to
> your question. :-)
>
> ~ Oldrich
>
> On Fri, Mar 7, 2008 at 8:41 AM, Soumyadeep nandi
> wrote:
> > Thanks a lot Oldrich,
> > Yes, its a good idea to remove the columns before taking the data into
R
> and
> > you are right this would reduce the memory load.
> >
> > Thanks a lot, your help is really appreciable. :-)
> >
> > BTW, if you dont mind some personal queries, what do you do?
> >
> > With best of my regards,
> > Mr Soumyadeep Nandi
> > Research Scholar
> > Center for Computational Biology and Bioinformatics
> > School of Information Technology
> > Jawaharlal Nehru University
> > New Delhi 110067
> > India
> >
> >
>
> > Oldrich Kruza wrote:
> > Hello Soumyadeep,
> >
> > if you store the data in a tabular file, then I suggest using standard
> > text-editing tools like cut (say your file is called data.csv, fields
> > are separated with commas and you want to get rid of the third and
> > sixth column):
> >
> > $ cut --complement --delimiter="," --fields=3,6 <
data.csv > data_cut.csv
> >
> > If you're not in an Unix environment but have perl, then you may
use a
> > script like:
> >
> > open SRC, "data.csv" or die("couldn't open
source");
> > open DST, ">data_cut.csv" or die("couldn't open
destination");
> > while () {
> >
> > chomp;
> > @fields = split /,/; #substitute the comma for the delimiter you use
> > splice @fields, 2, 1; #get rid of third column (they're
> > zero-based, thus 2 instead of 3)
> > splice @fields, 5, 1; #get rid of sixth column
> > print DST join(",", @fields), "\n";
> > }
> >
> > If you need to do the selection within R, then you can do it by
> > indexing the data structure. Suppose you have the data in a data.frame
> > called data. Then:
> >
> > > data <- data[,-6]
> > > data <- data[,-3]
> >
> > might do the trick (but since I'm not much of an R hacker, this is
> > without guarantee). I think it might be better however to do the
> > preprocessing before the data get into R because then you avoid
> > loading the columns to discard into memory.
> >
> > Hope this helps
> > ~ Oldrich
> >
> > On Fri, Mar 7, 2008 at 7:55 AM, Soumyadeep nandi
> > wrote:
> > > Thanks Oldrich,
> > > Actually I was not sure if I can remove these columns and build
model.
> > > Thanks a lot for your kind suggestion. Could you tell me if there
any
> > > function to remove these columns from the data matrix.
> > >
> > > With best regards,
> > > Soumyadeep
> > >
> > >
> >
> > > Oldrich Kruza wrote:
> > > A rather technical workaround I see could be adding a row with a
> > > different value. But if a column only ever has one value, then it
> > > contributes nothing to the model and I see no reason why it would
have
> > > to be kept.
> > > ~ Oldrich Kruza
> > >
> > > On Fri, Mar 7, 2008 at 6:45 AM, Soumyadeep nandi
> > > wrote:
> > > > What should I do if I need to train svm() with data having
same value
> > > across
> > > > all rows in some columns. These must be the important
features of the
> > > class
> > > > and we cant exclude these columns to build up models.
> > > >
> > > > The error I am getting is:
> > > > Error in predict.svm(ret, xhold) : Model is empty!
> > > > In addition: Warning message:
> > > > In svm.default(datatrain, classtrain) :
> > > > Variable(s) 'F112' and 'F113'.... [...
truncated]
> > > >
> > > > Is there any way to overcome this problem? Any suggestions
would be
> > highly
> > > > helpful.
> > > >
> > > > Regards
> > > > Soumyadeep
> > > >
> > > >
> > > > ________________________________
> > > > Be a better friend, newshound, and know-it-all with Yahoo!
Mobile. Try
> > it
> > > > now.
> > >
> > >
> > >
> > > ________________________________
> > > Looking for last minute shopping deals? Find them fast with
Yahoo!
> Search.
> >
> >
> >
> >
> > ________________________________
> > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try
it
> > now.
>
>
>
>
> ________________________________
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
> now.
---------------------------------
[[alternative HTML version deleted]]