CG Pettersson
2016-Jan-13 19:02 UTC
[R] Problems with data structure when using plsr() from package pls
R version 3.2.3, W7 64bit. Dear all! I am trying to make pls-regression using plsr() from package pls, with Mevik & Wehrens (2007) as tutorial and the datasets from the package. Everything works real nice as long as I use the supplied datasets, but I don?t understand how to prepare my own data. This is what I have done:> frame1 <- data.frame(gushVM, I(n96))Where gushVM is a vector with fifteen reference analysis values of a quality problem in grain and n96 is a matrix with fifteen rows and 96 columns from an electronic nose. I try to copy the methods as in 3.2 in Mevik & Wehrens, and want to keep n96 as one variable to avoid addressing 96 different variables in the plsr call. If I don?t use I() in the call I get 96 variables instead. Looking at the dataframe by summary(frame1) get a return quite like summary(gasoline) from the package (not shown here). But when I try to use plsr() with my own data it doesn?t work due to an error in the data structure:> pls1 <- plsr(gushVM ~ n96, data = frame1)Error in model.frame.default(formula = gushVM ~ n96, data = frame1) : invalid type (list) for variable 'n96'>So, n96 has turned into a list, and that is a problem. If gushVM is a vector (one variable) och a matrix (five variables) does not seem to change anything, managing n96 is the problem I have tried all alternative ways of creating a proper data frame suggested in the article with exactly the same result. I have tried the documentation for data.frame() but I probably don?t understand what it says. What should I do to change "n96" into something better than "list"? Thanks /CG Med v?nlig h?lsning/Best regards CG Pettersson Scientific Project Manager, PhD ______________________ Lantm?nnen Corporate R&D Phone: +46 10 556 19 85 Mobile: + 46 70 330 66 85 Email: cg.pettersson at lantmannen.com<mailto:cg.pettersson at lantmannen.com> Visiting Address: S:t G?ransgatan 160 A Address: Box 30192, SE-104 25 Stockholm Webb: lantmannen.com<lantmannen.com> Registered Office: Stockholm Before printing, think about the environment [[alternative HTML version deleted]]
Jeff Newmiller
2016-Jan-14 04:16 UTC
[R] Problems with data structure when using plsr() from package pls
Using I() in the data.frame seems ill-advised to me. You complain about 96 variables but from reading your explanation that seems to be what your data are. I have no idea whether it makes sense to NOT have 96 variables if that is what your data are. Note that a reproducible example supplied by you might help us guess better, but it might just be that your expectations are wrong. -- Sent from my phone. Please excuse my brevity. On January 13, 2016 11:02:25 AM PST, CG Pettersson <cg.pettersson at lantmannen.com> wrote:>R version 3.2.3, W7 64bit. > >Dear all! > >I am trying to make pls-regression using plsr() from package pls, with >Mevik & Wehrens (2007) as tutorial and the datasets from the package. >Everything works real nice as long as I use the supplied datasets, but >I don?t understand how to prepare my own data. >This is what I have done: > >> frame1 <- data.frame(gushVM, I(n96)) > >Where gushVM is a vector with fifteen reference analysis values of a >quality problem in grain and n96 is a matrix with fifteen rows and 96 >columns from an electronic nose. I try to copy the methods as in 3.2 in >Mevik & Wehrens, and want to keep n96 as one variable to avoid >addressing 96 different variables in the plsr call. If I don?t use I() >in the call I get 96 variables instead. >Looking at the dataframe by summary(frame1) get a return quite like >summary(gasoline) from the package (not shown here). >But when I try to use plsr() with my own data it doesn?t work due to an >error in the data structure: > >> pls1 <- plsr(gushVM ~ n96, data = frame1) >Error in model.frame.default(formula = gushVM ~ n96, data = frame1) : > invalid type (list) for variable 'n96' >> >So, n96 has turned into a list, and that is a problem. If gushVM is a >vector (one variable) och a matrix (five variables) does not seem to >change anything, managing n96 is the problem >I have tried all alternative ways of creating a proper data frame >suggested in the article with exactly the same result. >I have tried the documentation for data.frame() but I probably don?t >understand what it says. > >What should I do to change "n96" into something better than "list"? > >Thanks >/CG > >Med v?nlig h?lsning/Best regards >CG Pettersson >Scientific Project Manager, PhD >______________________ >Lantm?nnen Corporate R&D >Phone: +46 10 556 19 85 >Mobile: + 46 70 330 66 85 >Email: >cg.pettersson at lantmannen.com<mailto:cg.pettersson at lantmannen.com> >Visiting Address: S:t G?ransgatan 160 A >Address: Box 30192, SE-104 25 Stockholm >Webb: lantmannen.com<lantmannen.com> >Registered Office: Stockholm >Before printing, think about the environment > > > [[alternative HTML version deleted]] > > > >------------------------------------------------------------------------ > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
CG Pettersson
2016-Jan-14 10:33 UTC
[R] Problems with data structure when using plsr() from package pls
Dear Jeff, thanks for the effort, but the use of I() when preparing the dataset is suggested by the authors (Mevik & Wehrens, section 3.2): +If Z is a matrix, it has to be protected by the ?protect function? I() in calls +to data.frame: mydata <- data.frame(..., Z = I(Z)). Otherwise, it will be split into +separate variables for each column, and there will be no variable called Z in the data frame, +so we cannot use Z in the formula. One can also add the matrix to an existing data frame: +R> mydata <- data.frame(...) +R> mydata$Z <- Z In the dataset "gasoline" that is supplied with the pls package, there are two variables; octane and NIR, where NIR is a frame with 401 columns and possible to work with like: plsr(octane ~NIR, data = gasoline) I thought "gasoline" was made like the example above, but I must be missing something else. Whatever I do ends with " invalid type (list) for variable 'n96'" So I am still stuck /CG Fr?n: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us] Skickat: den 14 januari 2016 05:16 Till: CG Pettersson; r-help at r-project.org ?mne: Re: [R] Problems with data structure when using plsr() from package pls Using I() in the data.frame seems ill-advised to me. You complain about 96 variables but from reading your explanation that seems to be what your data are. I have no idea whether it makes sense to NOT have 96 variables if that is what your data are. Note that a reproducible example supplied by you might help us guess better, but it might just be that your expectations are wrong. -- Sent from my phone. Please excuse my brevity. On January 13, 2016 11:02:25 AM PST, CG Pettersson <cg.pettersson at lantmannen.com> wrote: R version 3.2.3, W7 64bit. Dear all! I am trying to make pls-regression using plsr() from package pls, with Mevik & Wehrens (2007) as tutorial and the datasets from the package. Everything works real nice as long as I use the supplied datasets, but I don?t understand how to prepare my own data. This is what I have done: frame1 <- data.frame(gushVM, I(n96)) Where gushVM is a vector with fifteen reference analysis values of a quality problem in grain and n96 is a matrix with fifteen rows and 96 columns from an electronic nose. I try to copy the methods as in 3.2 in Mevik & Wehrens, and want to keep n96 as one variable to avoid addressing 96 different variables in the plsr call. If I don?t use I() in the call I get 96 variables instead. Looking at the data frame by summary(frame1) get a return quite like summary(gasoline) from the package (not shown here). But when I try to use plsr() with my own data it doesn?t work due to an error in the data structure: pls1 <- plsr(gushVM ~ n96, data = frame1) Error in model.frame.default(formula = gushVM ~ n96, data = frame1) : invalid type (list) for variable 'n96' So, n96 has turned into a list, and that is a problem. If gushVM is a vector (one variable) och a matrix (five variables) does not seem to change anything, managing n96 is the problem I have tried all alternative ways of creating a proper data frame suggested in the article with exactly the same result. I have tried the docum entation for data.frame() but I probably don?t understand what it says. What should I do to change "n96" into something better than "list"? Thanks /CG Med v?nlig h?lsning/Best regards CG Pettersson Scientific Project Manager, PhD ______________________ Lantm?nnen Corporate R&D Phone: +46 10 556 19 85 Mobile: + 46 70 330 66 85 Email: cg.pettersson at lantmannen.com<mailto:cg.pettersson at lantmannen.com> Visiting Address: S:t G?ransgatan 160 A Address: Box 30192, SE-104 25 Stockholm Webb: lantmannen.com<lantmannen.com> Registered Office: Stockholm Before printing, think about the environment [[alternative HTML version deleted]] R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bjørn-Helge Mevik
2016-Jan-15 12:33 UTC
[R] Problems with data structure when using plsr() from package pls
CG Pettersson <cg.pettersson at lantmannen.com> writes:>> frame1 <- data.frame(gushVM, I(n96))[...]>> pls1 <- plsr(gushVM ~ n96, data = frame1) > Error in model.frame.default(formula = gushVM ~ n96, data = frame1) : > invalid type (list) for variable 'n96'As far as I can remember, you get this error if the n96 object was a data.frame instead of a matrix. Can you check with, e.g.,> class(n96)If it says "data.frame", try using I(as.matrix(n96)). -- Regards, Bj?rn-Helge Mevik
Bjørn-Helge Mevik
2016-Jan-15 12:37 UTC
[R] Problems with data structure when using plsr() from package pls
Jeff Newmiller <jdnewmil at dcn.davis.ca.us> writes:> Using I() in the data.frame seems ill-advised to me. You complain about 96 > variables but from reading your explanation that seems to be what your data > are.In PSLR, it is common to regress a variable against matrices with very many coloumns, often several thousands. Using a data frame with one predictor variable for each coloumn is going to make the formula handling very slow. And if you have several such predictor matrices, it is very practical to keep them as single variables in the data frame, so you easily can select/deselect which groups of variables you want in the model. -- Regards, Bj?rn-Helge Mevik
Sarah Goslee
2016-Jan-15 12:55 UTC
[R] Problems with data structure when using plsr() from package pls
Backing up a step: On Wednesday, January 13, 2016, CG Pettersson <cg.pettersson at lantmannen.com> wrote:> R version 3.2.3, W7 64bit. > > Dear all! > > I am trying to make pls-regression using plsr() from package pls, with > Mevik & Wehrens (2007) as tutorial and the datasets from the package. > Everything works real nice as long as I use the supplied datasets, but I > don?t understand how to prepare my own data. > This is what I have done: > > > frame1 <- data.frame(gushVM, I(n96))Which ISN'T what the example you're following did. You didn't name the construct. frame1 <- data.frame(gushVM, n96 = I(n96)) so R can't find anything named n96 within frame1 because it's probably named some variant on I(n96). str(frame1) would have told you this. Sarah> > Where gushVM is a vector with fifteen reference analysis values of a > quality problem in grain and n96 is a matrix with fifteen rows and 96 > columns from an electronic nose. I try to copy the methods as in 3.2 in > Mevik & Wehrens, and want to keep n96 as one variable to avoid addressing > 96 different variables in the plsr call. If I don?t use I() in the call I > get 96 variables instead. > Looking at the dataframe by summary(frame1) get a return quite like > summary(gasoline) from the package (not shown here). > But when I try to use plsr() with my own data it doesn?t work due to an > error in the data structure: > > > pls1 <- plsr(gushVM ~ n96, data = frame1) > Error in model.frame.default(formula = gushVM ~ n96, data = frame1) : > invalid type (list) for variable 'n96' > > > So, n96 has turned into a list, and that is a problem. If gushVM is a > vector (one variable) och a matrix (five variables) does not seem to change > anything, managing n96 is the problem > I have tried all alternative ways of creating a proper data frame > suggested in the article with exactly the same result. > I have tried the documentation for data.frame() but I probably don?t > understand what it says. > > What should I do to change "n96" into something better than "list"? > > Thanks > /CG > > Med v?nlig h?lsning/Best regards > CG Pettersson > Scientific Project Manager, PhD > ______________________ > Lantm?nnen Corporate R&D > Phone: +46 10 556 19 85 > Mobile: + 46 70 330 66 85 > Email: cg.pettersson at lantmannen.com <javascript:;><mailto: > cg.pettersson at lantmannen.com <javascript:;>> > Visiting Address: S:t G?ransgatan 160 A > Address: Box 30192, SE-104 25 Stockholm > Webb: lantmannen.com<lantmannen.com> > Registered Office: Stockholm > Before printing, think about the environment > > > [[alternative HTML version deleted]] > >-- Sarah Goslee stringpage.com sarahgoslee.com functionaldiversity.org [[alternative HTML version deleted]]
S Ellison
2016-Jan-15 13:38 UTC
[R] Problems with data structure when using plsr() from package pls
> > I am trying to make pls-regression using plsr() from package pls, with > > Mevik & Wehrens (2007) as tutorial and the datasets from the package. > > Everything works real nice as long as I use the supplied datasets, but > > I don?t understand how to prepare my own data. > > This is what I have done: > > > > > frame1 <- data.frame(gushVM, I(n96))Reading ?plsr examples and inspecting the data they use, you need to arrange frame1 so that it has the data from n96 included as columns with names of the from "n96.xxx" whre xxx can be numbers, names etc. If n96 is a data frame, try something like names(n96) <- paste("n96", 1:96) frame1 <- cbind(gushVM, n96) pls1 <- plsr(gushVM ~ n96, data = frame1) If n96 is a matrix, frame1 <- data.frame(gushVM, n96=n96) should also give you a data frame with names of the right format. I() wrapped round a matrix or data frame does nothing like what is needed if you include it in a data frame construction, so either things have changed since the tutorial was written, or the authors were not handling a matrix or data frame with I(). S Ellison ******************************************************************* This email and any attachments are confidential. Any use, copying or disclosure other than by the intended recipient is unauthorised. If you have received this message in error, please notify the sender immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com and delete this message and any copies from your computer and network. LGC Limited. Registered in England 2991879. Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK