Hopefully simple question: What is the best way to name, and treat factor columns for data that has lots of columns? This is my column list: id pID50 D.1 D.2 D.3 D.4 D.5 , etc. all the way to D.185 I was under the impression from several R examples in pls that if you name your columns like above, you should be able to simply call all the D factors with "D", instead of going in and putting a plus sign between each column. miceD <- plsr(pID50~D, ncomp=10,data = micetitletest) Error in model.frame.default(formula = pID50 ~ D, data = micetitletest) : invalid type (closure) for variable 'D' VS. miceD <- plsr(pID50 ~ D.1 + D.2 + D.3 + D.4 etc. to D.185 , ncomp=10, data micetitletest) What am I missing above that's causing that error message in bold? Is there a better strategy for naming my columns in order to make R use easier? -- View this message in context: http://r.789695.n4.nabble.com/Column-header-strategy-tp2282740p2282740.html Sent from the R help mailing list archive at Nabble.com.
On Jul 8, 2010, at 3:14 PM, Addi Wei wrote:> > Hopefully simple question: What is the best way to name, and treat > factor > columns for data that has lots of columns? > > This is my column list: > id pID50 D.1 D.2 D.3 D.4 D.5 , etc. all the way to D.185 > > I was under the impression from several R examples in pls that if > you name > your columns like above, you should be able to simply call all the D > factors > with "D", instead of going in and putting a plus sign between each > column. > miceD <- plsr(pID50~D, ncomp=10,data = micetitletest) > Error in model.frame.default(formula = pID50 ~ D, data = > micetitletest) : > invalid type (closure) for variable 'D'"D" is a function name. As are "c", "data", and "df"> > VS. > > miceD <- plsr(pID50 ~ D.1 + D.2 + D.3 + D.4 etc. to D.185 , > ncomp=10, data > micetitletest) > > What am I missing above that's causing that error message in bold? > Is there > a better strategy for naming my columns in order to make R use easier? > > --David Winsemius, MD West Hartford, CT
On Thu, Jul 8, 2010 at 12:14 PM, Addi Wei <addiwei at gmail.com> wrote:> > Hopefully simple question: ?What is the best way to name, and treat factor > columns for data that has lots of columns? > > This is my column list: > id pID50 D.1 D.2 D.3 D.4 D.5 , etc. all the way to D.185 > > I was under the impression from several R examples in pls that if you name > your columns like above, you should be able to simply call all the D factors > with "D", instead of going in and putting a plus sign between each column.Hmm, I did not see this in the documentation (which does not mean it is not so). If you want to have every column in the model except id, you could use '.' all you would have to do is specifically remove the id column. This becomes messier if you had another k set of E.1 to k factors. See ?formula for how '.' works. miceD <- plsr(pID50 ~ . -id, ncomp = 10, data = micetitletest)> miceD <- plsr(pID50~D, ncomp=10,data = micetitletest) > Error in model.frame.default(formula = pID50 ~ D, data = micetitletest) : > ?invalid type (closure) for variable 'D' > > VS. > > miceD <- plsr(pID50 ~ D.1 + D.2 + D.3 + D.4 etc. to D.185 , ncomp=10, data > micetitletest) > > What am I missing above that's causing that error message in bold? ?Is there > a better strategy for naming my columns in order to make R use easier? > > -- > View this message in context: http://r.789695.n4.nabble.com/Column-header-strategy-tp2282740p2282740.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
On 2010-07-08 13:14, Addi Wei wrote:> > Hopefully simple question: What is the best way to name, and treat factor > columns for data that has lots of columns? > > This is my column list: > id pID50 D.1 D.2 D.3 D.4 D.5 , etc. all the way to D.185 > > I was under the impression from several R examples in pls that if you name > your columns like above, you should be able to simply call all the D factors > with "D", instead of going in and putting a plus sign between each column. > miceD<- plsr(pID50~D, ncomp=10,data = micetitletest) > Error in model.frame.default(formula = pID50 ~ D, data = micetitletest) : > invalid type (closure) for variable 'D' > > VS. > > miceD<- plsr(pID50 ~ D.1 + D.2 + D.3 + D.4 etc. to D.185 , ncomp=10, data > micetitletest) > > What am I missing above that's causing that error message in bold? Is there > a better strategy for naming my columns in order to make R use easier?From the help page for plsr(): "The formula argument should be a symbolic formula of the form response ~ terms, where response is the name of the response vector or matrix (for multi-response models) and terms is the name of one or more predictor _matrices_ (emphasis added), usually separated by +, e.g., water ~ FTIR or y ~ X + Z." Note the word _matrices_; you may not have set up your data correctly. Compare the 'yarn' dataset str(yarn) with your data str(micetitletest) And, as David says, don't use D for the name of your predictor matrix (although it will probably work). -Peter Ehlers
On Thu, Jul 8, 2010 at 3:14 PM, Addi Wei <addiwei@gmail.com> wrote:> > Hopefully simple question: What is the best way to name, and treat factor > columns for data that has lots of columns? > > This is my column list: > id pID50 D.1 D.2 D.3 D.4 D.5 , etc. all the way to D.185 >It would be much better to use a factor and let R itself generate the set of dummy variables. Choose a useful name. For example if D stands for Dose, then use something like micetitletest$Dose <- factor(micetitletest$dose.values) miceD <- plsr(pID50 ~ Dose, ncomp=10, data = micetitletest) Read about factors in the online An Introduction to R distributed as part of R. [[alternative HTML version deleted]]