Folks, I appreciate your effort, but maybe I was not explicit enough, so let me try again. The current setup for formulas does not allow for I(x^2) terms as explained in the MASS book at the end of Section 6.2 the x:x interaction is treated as x. So I need to write my own code, which is clumsy unless you can refer me to a package that already exists or give me an idea how to improve my code. Also, writing out terms is not feasible when there are 1000 variables, so the code needs to deal with taking a wide data frame or matrix with column names for convenience. Let me know your ideas :-) On Mon, 2025-03-24 at 02:43 -0700, Bert Gunter wrote:> Full disclosure: I did not attempt to decipher your code. > > But ~(A+B +C)^2 - (A?+ B?+ C) > gives all 2nd order interactions whether the terms are factors or > numeric. > > ~I(A^2) + I(B^2) gives quadratics in A and B, which must be numeric, > not factors, of course > > You can combine these as necessary to get a formula expression for > just 2nd order terms. Wrapping this in model.matrix() should then > give you the model matrix using "treatment" contrasts for the > contrasts involving factors (you can change the contrast types using > the 'contrasts.arg' argument of model.matrix()) > > 1. Does this help? > 2. Do check this to make sure I'm correct > > Cheers, > Bert > > "An educated person is one who can entertain new ideas, entertain > others, and entertain herself." > > > > On Mon, Mar 24, 2025 at 12:22?AM Stephen Bond via R-help > <r-help at r-project.org> wrote: > > I am sending to this forum as stackoverflow has devolved into sth > > pretty bad. > > Below code shows how to get what I want in a clumsy way. > > > > cols <- letters[1:4] > > a1 <- outer(cols,cols,paste0) > > b1 <- a1[!lower.tri(a1)] > > > > X <- matrix(rnorm(80),ncol=4) > > colnames(X) <- cols > > X <- as.data.frame(X) > > XX <- matrix(0,nrow=nrow(X),ncol=length(b1)) > > colnames(XX) <- b1 > > > > for (k in 1:length(b1)){ > > ? ? XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)] > > } > > > > > > > > Is there a way to get that using a formula or some neat trick? The > > above will not work for factors, so I will need to create the > > factor > > crossings using formula a*b*c and then cross with the numerics, > > which > > is even more clumsy. > > Thanks everybody > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > https://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
I'm still not entirely clear what you want, in particular what you want to with factor variables ('squaring' them doesn't really seem to make sense). Some combination of sprintf/paste/reformulate might do what you want: ## get all second-order terms (doesn't square numeric values) quad_terms <- sprintf("(%s)^2", paste(b1, collapse = "+")) ## get squared terms square_terms <- paste(sprintf("I(%s^2)", b1)) reformulate(c(quad_terms, square_terms)) The other alternative would be to construct a second-order polynomial across all of your terms (and then insert this into reformulate()) poly_term <- sprintf("poly(%s, degree = 2, raw = TRUE)", paste(b1, collapse = ", ")) On 2025-03-24 11:28 a.m., Stephen Bond via R-help wrote:> Folks, > > I appreciate your effort, but maybe I was not explicit enough, so let > me try again. > > The current setup for formulas does not allow for I(x^2) terms as > explained in the MASS book at the end of Section 6.2 the x:x > interaction is treated as x. > > So I need to write my own code, which is clumsy unless you can refer me > to a package that already exists or give me an idea how to improve my > code. Also, writing out terms is not feasible when there are 1000 > variables, so the code needs to deal with taking a wide data frame or > matrix with column names for convenience. > > Let me know your ideas :-) > > On Mon, 2025-03-24 at 02:43 -0700, Bert Gunter wrote: >> Full disclosure: I did not attempt to decipher your code. >> >> But ~(A+B +C)^2 - (A?+ B?+ C) >> gives all 2nd order interactions whether the terms are factors or >> numeric. >> >> ~I(A^2) + I(B^2) gives quadratics in A and B, which must be numeric, >> not factors, of course >> >> You can combine these as necessary to get a formula expression for >> just 2nd order terms. Wrapping this in model.matrix() should then >> give you the model matrix using "treatment" contrasts for the >> contrasts involving factors (you can change the contrast types using >> the 'contrasts.arg' argument of model.matrix()) >> >> 1. Does this help? >> 2. Do check this to make sure I'm correct >> >> Cheers, >> Bert >> >> "An educated person is one who can entertain new ideas, entertain >> others, and entertain herself." >> >> >> >> On Mon, Mar 24, 2025 at 12:22?AM Stephen Bond via R-help >> <r-help at r-project.org> wrote: >>> I am sending to this forum as stackoverflow has devolved into sth >>> pretty bad. >>> Below code shows how to get what I want in a clumsy way. >>> >>> cols <- letters[1:4] >>> a1 <- outer(cols,cols,paste0) >>> b1 <- a1[!lower.tri(a1)] >>> >>> X <- matrix(rnorm(80),ncol=4) >>> colnames(X) <- cols >>> X <- as.data.frame(X) >>> XX <- matrix(0,nrow=nrow(X),ncol=length(b1)) >>> colnames(XX) <- b1 >>> >>> for (k in 1:length(b1)){ >>> ? ? XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)] >>> } >>> >>> >>> >>> Is there a way to get that using a formula or some neat trick? The >>> above will not work for factors, so I will need to create the >>> factor >>> crossings using formula a*b*c and then cross with the numerics, >>> which >>> is even more clumsy. >>> Thanks everybody >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> https://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering > E-mail is sent at my convenience; I don't expect replies outside of working hours.
Ebert,Timothy Aaron
2025-Mar-24 23:52 UTC
[R] how to create model matrix of second order terms
You could try writing code to write code. However, you need to tell the program which variables are factor and which are continuous. If you are lucky and all integer data are factor and all non-integers are continuous, then you can sort for that. However, in a broad data perspective this is a bad assumption. I have (somehow) sorted the variables into factor and non-factor variables. Non-factor variables are in vars1, and factor variables are in vars2. The code might look something like this. vars1 <- c("x1", "x2", "x3") vars2 <- c("F1", "F2", "F3") interactions <- as.vector(outer(vars1, vars1, function(x, y) ifelse(x < y, paste(x, y, sep = ":"), NA))) interactions <- interactions[!is.na(interactions)] result <- c(vars1, interactions) result <- c(result, vars2) formula <- as.formula(paste("y ~", paste(result, collapse = " + "))) model <- lda(formula, data = mydata) Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Stephen Bond via R-help Sent: Monday, March 24, 2025 11:29 AM To: Bert Gunter <bgunter.4567 at gmail.com> Cc: r-help at r-project.org Subject: Re: [R] how to create model matrix of second order terms [External Email] Folks, I appreciate your effort, but maybe I was not explicit enough, so let me try again. The current setup for formulas does not allow for I(x^2) terms as explained in the MASS book at the end of Section 6.2 the x:x interaction is treated as x. So I need to write my own code, which is clumsy unless you can refer me to a package that already exists or give me an idea how to improve my code. Also, writing out terms is not feasible when there are 1000 variables, so the code needs to deal with taking a wide data frame or matrix with column names for convenience. Let me know your ideas :-) On Mon, 2025-03-24 at 02:43 -0700, Bert Gunter wrote:> Full disclosure: I did not attempt to decipher your code. > > But ~(A+B +C)^2 - (A + B + C) > gives all 2nd order interactions whether the terms are factors or > numeric. > > ~I(A^2) + I(B^2) gives quadratics in A and B, which must be numeric, > not factors, of course > > You can combine these as necessary to get a formula expression for > just 2nd order terms. Wrapping this in model.matrix() should then give > you the model matrix using "treatment" contrasts for the contrasts > involving factors (you can change the contrast types using the > 'contrasts.arg' argument of model.matrix()) > > 1. Does this help? > 2. Do check this to make sure I'm correct > > Cheers, > Bert > > "An educated person is one who can entertain new ideas, entertain > others, and entertain herself." > > > > On Mon, Mar 24, 2025 at 12:22?AM Stephen Bond via R-help > <r-help at r-project.org> wrote: > > I am sending to this forum as stackoverflow has devolved into sth > > pretty bad. > > Below code shows how to get what I want in a clumsy way. > > > > cols <- letters[1:4] > > a1 <- outer(cols,cols,paste0) > > b1 <- a1[!lower.tri(a1)] > > > > X <- matrix(rnorm(80),ncol=4) > > colnames(X) <- cols > > X <- as.data.frame(X) > > XX <- matrix(0,nrow=nrow(X),ncol=length(b1)) > > colnames(XX) <- b1 > > > > for (k in 1:length(b1)){ > > XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)] > > } > > > > > > > > Is there a way to get that using a formula or some neat trick? The > > above will not work for factors, so I will need to create the factor > > crossings using formula a*b*c and then cross with the numerics, > > which is even more clumsy. > > Thanks everybody > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://st/ > > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl > > .edu%7C4e92d4943a594be971d508dd6b2270b0%7C0d4da0f84a314d76ace60a6233 > > 1e1b84%7C0%7C0%7C638784517844383573%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0 > > eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIs > > IldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2FXX2ZHU5NpqeUYbjKRLzBa%2BXpGu% > > 2F2ddcdqXfbdnUW50%3D&reserved=0 > > PLEASE do read the posting guide > > https://ww/ > > w.r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu > > %7C4e92d4943a594be971d508dd6b2270b0%7C0d4da0f84a314d76ace60a62331e1b > > 84%7C0%7C0%7C638784517844402770%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1h > > cGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldU > > IjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=CW0bStnd5O6BPQECRuyfTJDnWXPDIZMlDIUm > > DUoQTJc%3D&reserved=0 and provide commented, minimal, > > self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
?s 15:28 de 24/03/2025, Stephen Bond via R-help escreveu:> Folks, > > I appreciate your effort, but maybe I was not explicit enough, so let > me try again. > > The current setup for formulas does not allow for I(x^2) terms as > explained in the MASS book at the end of Section 6.2 the x:x > interaction is treated as x. > > So I need to write my own code, which is clumsy unless you can refer me > to a package that already exists or give me an idea how to improve my > code. Also, writing out terms is not feasible when there are 1000 > variables, so the code needs to deal with taking a wide data frame or > matrix with column names for convenience. > > Let me know your ideas :-) > > On Mon, 2025-03-24 at 02:43 -0700, Bert Gunter wrote: >> Full disclosure: I did not attempt to decipher your code. >> >> But ~(A+B +C)^2 - (A?+ B?+ C) >> gives all 2nd order interactions whether the terms are factors or >> numeric. >> >> ~I(A^2) + I(B^2) gives quadratics in A and B, which must be numeric, >> not factors, of course >> >> You can combine these as necessary to get a formula expression for >> just 2nd order terms. Wrapping this in model.matrix() should then >> give you the model matrix using "treatment" contrasts for the >> contrasts involving factors (you can change the contrast types using >> the 'contrasts.arg' argument of model.matrix()) >> >> 1. Does this help? >> 2. Do check this to make sure I'm correct >> >> Cheers, >> Bert >> >> "An educated person is one who can entertain new ideas, entertain >> others, and entertain herself." >> >> >> >> On Mon, Mar 24, 2025 at 12:22?AM Stephen Bond via R-help >> <r-help at r-project.org> wrote: >>> I am sending to this forum as stackoverflow has devolved into sth >>> pretty bad. >>> Below code shows how to get what I want in a clumsy way. >>> >>> cols <- letters[1:4] >>> a1 <- outer(cols,cols,paste0) >>> b1 <- a1[!lower.tri(a1)] >>> >>> X <- matrix(rnorm(80),ncol=4) >>> colnames(X) <- cols >>> X <- as.data.frame(X) >>> XX <- matrix(0,nrow=nrow(X),ncol=length(b1)) >>> colnames(XX) <- b1 >>> >>> for (k in 1:length(b1)){ >>> ? ? XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)] >>> } >>> >>> >>> >>> Is there a way to get that using a formula or some neat trick? The >>> above will not work for factors, so I will need to create the >>> factor >>> crossings using formula a*b*c and then cross with the numerics, >>> which >>> is even more clumsy. >>> Thanks everybody >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> https://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, Are you looking for ?poly to generate all 1st and 2nd order combinations? In the outputs below, dimnames[[2]] tell which term corresponds to which column. So 1.0.0.0 is 'a' only and 2.0.0.0 is 'a^2'. cols <- letters[1:4] a1 <- outer(cols,cols,paste0) b1 <- a1[!lower.tri(a1)] X <- matrix(rnorm(80),ncol=4) colnames(X) <- cols X <- as.data.frame(X) XX <- matrix(0,nrow=nrow(X),ncol=length(b1)) colnames(XX) <- b1 for (k in 1:length(b1)){ XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)] } # XX # model.matrix( ~ (a + b + c + d)^2 , data=X) # model.matrix( ~ (a + b + c + d)^2 - (a + b + c +d), data=X) # Is this what you want? # Ugly colnames m <- model.matrix( ~ poly(a, b, c, d, degree = 2L) , data = X) dimnames(m) #> [[1]] #> [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" #> [16] "16" "17" "18" "19" "20" #> #> [[2]] #> [1] "(Intercept)" "poly(a, b, c, d, degree = 2)1.0.0.0" #> [3] "poly(a, b, c, d, degree = 2)2.0.0.0" "poly(a, b, c, d, degree = 2)0.1.0.0" #> [5] "poly(a, b, c, d, degree = 2)1.1.0.0" "poly(a, b, c, d, degree = 2)0.2.0.0" #> [7] "poly(a, b, c, d, degree = 2)0.0.1.0" "poly(a, b, c, d, degree = 2)1.0.1.0" #> [9] "poly(a, b, c, d, degree = 2)0.1.1.0" "poly(a, b, c, d, degree = 2)0.0.2.0" #> [11] "poly(a, b, c, d, degree = 2)0.0.0.1" "poly(a, b, c, d, degree = 2)1.0.0.1" #> [13] "poly(a, b, c, d, degree = 2)0.1.0.1" "poly(a, b, c, d, degree = 2)0.0.1.1" #> [15] "poly(a, b, c, d, degree = 2)0.0.0.2" # These colnames are nicer p <- with(X, poly(a, b, c, d, degree = 2L)) attributes(p) #> $dim #> [1] 20 14 #> #> $dimnames #> $dimnames[[1]] #> NULL #> #> $dimnames[[2]] #> 2 3 4 5 7 10 11 13 #> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0" "1.0.1.0" "0.1.1.0" #> 19 28 29 31 37 55 #> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2" #> #> #> $degree #> [1] 1 2 1 2 2 1 2 2 2 1 2 2 2 2 #> #> $coefs #> $coefs[[1]] #> $coefs[[1]]$alpha #> [1] 0.1134201 0.4901752 #> #> $coefs[[1]]$norm2 #> [1] 1.00000 20.00000 34.11147 140.62467 #> #> #> $coefs[[2]] #> $coefs[[2]]$alpha #> [1] 0.1356218 1.0041956 #> #> $coefs[[2]]$norm2 #> [1] 1.00000 20.00000 21.53021 39.73742 #> #> #> $coefs[[3]] #> $coefs[[3]]$alpha #> [1] 0.2533081 0.1689801 #> #> $coefs[[3]]$norm2 #> [1] 1.00000 20.00000 19.08499 19.83333 #> #> #> $coefs[[4]] #> $coefs[[4]]$alpha #> [1] -0.03597259 -0.87409789 #> #> $coefs[[4]]$norm2 #> [1] 1.00000 20.00000 24.68273 73.41893 #> #> #> #> $class #> [1] "poly" "matrix" dimnames(p) #> [[1]] #> NULL #> #> [[2]] #> 2 3 4 5 7 10 11 13 #> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0" "1.0.1.0" "0.1.1.0" #> 19 28 29 31 37 55 #> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2" Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com