Dear list: I am interested in the following sort of problem, as is found frequently in the field of QSAR. I have biological activity as a function of chemical structure, with structure defined in a categorical manner in that the SUBSTITUENT is the levels of the POSITION factor. For example, data from Kubinyi (http://www.kubinyi.de/dd-12.pdf) for this type of analysis is presented as follows: factor para: H F Cl Br I Me H H H H H F F F Cl Cl Cl Br Br Br Me Me factor meta: H H H H H H F Cl Br I Me Cl Br Me Cl Br Me Cl Br Me Me Br observed biological activity: 7.46 8.16 8.68 8.89 9.25 9.30 7.52 8.16 8.30 8.40 8.46 8.19 8.57 8.82 8.89 8.92 8.96 9.00 9.35 9.22 9.30 9.52 I then think the following analysis should be appropriate meta<-factor(scan(file="meta",what="character")) para<-factor(scan(file="para",what="character")) ba<-scan(file="ba") rslt<-lm(ba~meta+para-1) What I wish to obtain is a coefficient for each substituent at each position, as does Kubinyi: H F Cl Br I Me meta 0.00 -0.30 0.21 0.43 0.58 0.45 para 0.00 0.34 0.77 1.02 1.43 1.26 However, I do not get a coefficient for the Br substituent at the para position. I would like to know if there is an error in this formulation. The technique is quite well established in the field of medicinal chemistry and it is traditional that the binary incidence matrix is formed "by hand" as an intermediate step in the analysis, instead of the much simpler formulation that I am considering here. Thank you for whatever insight you may give. Prof. Roy Little Dept. Chem. Universidad de los Andes M?rida, Venezuela
Dear list: I am interested in the following sort of problem, as is found frequently in the field of QSAR. I have biological activity as a function of chemical structure, with structure defined in a categorical manner in that the SUBSTITUENT is the levels of the POSITION factor. For example, data from Kubinyi (http://www.kubinyi.de/dd-12.pdf) for this type of analysis is presented as follows: factor para: H F Cl Br I Me H H H H H F F F Cl Cl Cl Br Br Br Me Me factor meta: H H H H H H F Cl Br I Me Cl Br Me Cl Br Me Cl Br Me Me Br observed biological activity: 7.46 8.16 8.68 8.89 9.25 9.30 7.52 8.16 8.30 8.40 8.46 8.19 8.57 8.82 8.89 8.92 8.96 9.00 9.35 9.22 9.30 9.52 I then think the following analysis should be appropriate meta<-factor(scan(file="meta",what="character")) para<-factor(scan(file="para",what="character")) ba<-scan(file="ba") rslt<-lm(ba~meta+para-1) What I wish to obtain is a coefficient for each substituent at each position, as does Kubinyi: H F Cl Br I Me meta 0.00 -0.30 0.21 0.43 0.58 0.45 para 0.00 0.34 0.77 1.02 1.43 1.26 However, I do not get a coefficient for the Br substituent at the para position. I would like to know if there is an error in this formulation. The technique is quite well established in the field of medicinal chemistry and it is traditional that the binary incidence matrix is formed "by hand" as an intermediate step in the analysis, instead of the much simpler formulation that I am considering here. Thank you for whatever insight you may give. Prof. Roy Little Dept. Chem. Universidad de los Andes M?rida, Venezuela