Zach Simpson
2018-Oct-11 03:37 UTC
[R] Defining Variables from a Matrix for 10-Fold Cross Validation
Hey Matthew, In addition to what's been mentioned, you may want to look at the 'caret' package, as it provides a nice system for whatever flavor of cross-validation you're after *and* has a built-in method for `kknn`: http://topepo.github.io/caret/available-models.html Hope this helps, Zach Simpson On October 9, 2018 15:34:15 -0700, David Winsemius <dwinsemius at comcast.net> wrote:> Message: 26 > Date: Tue, 9 Oct 2018 15:34:15 -0700 > From: David Winsemius <dwinsemius at comcast.net> > To: matthew campbell <mcc3qb at virginia.edu> > Cc: R-help at r-project.org > Subject: Re: [R] Defining Variables from a Matrix for 10-Fold Cross > Validation > Message-ID: <85DC895F-BEA2-4E47-ACC1-49A5C350B2D8 at comcast.net> > Content-Type: text/plain; charset="us-ascii" > > > > On Oct 9, 2018, at 3:04 PM, matthew campbell <mcc3qb at virginia.edu> wrote: > > > > Good afternoon, > > > > I am trying to run a 10-fold CV, using a matrix as my data set. > > Essentially, I want "y" to be the first column of the matrix, and my "x" to > > be all remaining columns (2-257). I've posted some of the code I used > > below, and the data set (called "zip.train") is in the "ElemStatLearn" > > package. The error message is highlighted in red, and the corresponding > > section of code is bolded. (I am not concerned with the warning message, > > just the error message). > > > > The issue I am experiencing is the error message below the code: I haven't > > come across that specific message before, and am not exactly sure how to > > interpret its meaning. What exactly is this error message trying to tell > > me? Any suggestions or insights are appreciated! > > > > Thank you all, > > > > Matthew Campbell > > > > > >> library (ElemStatLearn) > >> library(kknn) > >> data(zip.train) > >> train=zip.train[which(zip.train[,1] %in% c(2,3)),] > >> test=zip.test[which(zip.test[,1] %in% c(2,3)),] > >> nfold = 10 > >> infold = sample(rep(1:10, length.out = (x))) > > I don't see a definition for x. > > > Warning message: > > In rep(1:10, length.out = (x)) : > > first element used of 'length.out' argument > > But apparently it las a length greater than 1 and your are getting a sample whose length is specified by the first element of x. > > > >> > > *> mydata = data.frame(x = train[ , c(2,257)] , y = train[ , 1])* > >> > >> K = 20 > >> errorMatrix = matrix(NA, K, 10) > >> > >> for (l in nfold) > > + { > > + for (k in 1:20) > > + { > > + knn.fit = kknn(y ~ x, train = mydata[infold != l, ], test > > mydata[infold == l, ], k = k) > > + errorMatrix[k, l] = mean((knn.fit$fitted.values - mydata$y[infold => > l])^2) > > + } > > + } > > Error in model.frame.default(formula, data = train) : > > variable lengths differ (found for 'x') > > So the warning above is probably a great clue to the source of this error. > > Morale of the tale: Always read the warnings, even if your code proceeds. > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > "The whole problem with the world is that fools and fanatics are always so certain of themselves, and wiser people so full of doubts." - Bertrand Russell