Inman, Brant A. M.D.
2007-Jan-06 22:56 UTC
[R] Using VGAM's vglm function for ordinal logistic regression
R-Experts: I am using the vglm function of the VGAM library to perform proportional odds ordinal logistic regression. The issue that I would like help with concerns the format in which the response variable must be provided for this function to work correctly. Consider the following example: ------ library(VGAM) library(MASS) attach(pneumo) pneumo # Inspect the format of the original dataset # Restructure the pneumo dataset into a different format pneumo2 <- data.frame(matrix(ncol=3, nrow=24)) colnames(pneumo2) <- c('exposure.time', 'severity', 'freq') pneumo2[,1] <- rep(pneumo[,1],3) pneumo2[,2] <- as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8))) pneumo2[1:8,3] <- pneumo[,2] pneumo2[9:16,3] <- pneumo[,3] pneumo2[17:24,3] <- pneumo[,4] pneumo2 # Inspect the format of the new modified dataset ------ The problem occurs when I try to analyze these two datasets, which are identical in content, with the proportional odds model using vglm: ------ # Analyze the original dataset with vglm, get one set of results> vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time),data=pneumo, + family=cumulative(parallel=T)) Coefficients: (Intercept):1 (Intercept):2 log(exposure.time) 9.676093 10.581725 -2.596807 Degrees of Freedom: 16 Total; 13 Residual Residual Deviance: 5.026826 Log-likelihood: -204.2742 # Analyzing the modified dataset with vglm gives another set of results> vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2,+ family=cumulative(parallel=T)) Coefficients: (Intercept):1 (Intercept):2 log(exposure.time) -1.6306499 2.5630170 -0.1937956 Degrees of Freedom: 48 Total; 45 Residual Residual Deviance: 503.7251 Log-likelihood: -251.8626 Warning messages: 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace trace, wzeps = control$wzepsilon) 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace trace, wzeps = control$wzepsilon) 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace trace, wzeps = control$wzepsilon) 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace trace, wzeps = control$wzepsilon) 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace trace, wzeps = control$wzepsilon) # Analyzing the modified dataset with polr reproduces these second results> polr(severity ~ log(exposure.time), weights=freq, data=pneumo2)Coefficients: log(exposure.time) 0.1938154 Intercepts: mild|normal normal|severe -1.630612 2.563055 Residual Deviance: 503.7251 AIC: 509.7251 ------ Can someone explain what I am doing wrong when using vglm and polr with the modified dataset? I do not understand why these two formulations should give different results. Brant Inman
Prof Brian Ripley
2007-Jan-07 08:52 UTC
[R] Using VGAM's vglm function for ordinal logistic regression
On Sat, 6 Jan 2007, Inman, Brant A. M.D. wrote:> > R-Experts: > > I am using the vglm function of the VGAM library to perform proportional > odds ordinal logistic regression. The issue that I would like help with > concerns the format in which the response variable must be provided for > this function to work correctly.> pneumo2$severity[1] normal normal normal normal normal normal normal normal mild mild [11] mild mild mild mild mild mild severe severe severe severe [21] severe severe severe severe Levels: mild < normal < severe is different from the ordering in the first example. The difference between vglm and polr is that the latter uses the conventional sign for the coefficients: see MASS4 p.204. I would never use as.ordered on a character vector, as this leaves it to R to choose the ordering. (Even if you think you intended alphabetical order, that depends on the locale: see the warnings on the help page.)> Consider the following example: > > ------ > > library(VGAM) > library(MASS) > > attach(pneumo) > pneumo # Inspect the format of the original dataset > > # Restructure the pneumo dataset into a different format > pneumo2 <- data.frame(matrix(ncol=3, nrow=24)) > colnames(pneumo2) <- c('exposure.time', 'severity', 'freq') > pneumo2[,1] <- rep(pneumo[,1],3) > pneumo2[,2] <- > as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8))) > pneumo2[1:8,3] <- pneumo[,2] > pneumo2[9:16,3] <- pneumo[,3] > pneumo2[17:24,3] <- pneumo[,4] > pneumo2 # Inspect the format of the new modified dataset > > ------ > > The problem occurs when I try to analyze these two datasets, which are > identical in content, with the proportional odds model using vglm: > > ------ > > # Analyze the original dataset with vglm, get one set of results > >> vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time), > data=pneumo, > + family=cumulative(parallel=T)) > > Coefficients: > (Intercept):1 (Intercept):2 log(exposure.time) > 9.676093 10.581725 -2.596807 > > Degrees of Freedom: 16 Total; 13 Residual > Residual Deviance: 5.026826 > Log-likelihood: -204.2742 > > # Analyzing the modified dataset with vglm gives another set of results > >> vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2, > + family=cumulative(parallel=T)) > > Coefficients: > (Intercept):1 (Intercept):2 log(exposure.time) > -1.6306499 2.5630170 -0.1937956 > > Degrees of Freedom: 48 Total; 45 Residual > Residual Deviance: 503.7251 > Log-likelihood: -251.8626 > Warning messages: > 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > > # Analyzing the modified dataset with polr reproduces these second > results > >> polr(severity ~ log(exposure.time), weights=freq, data=pneumo2) > > Coefficients: > log(exposure.time) > 0.1938154 > > Intercepts: > mild|normal normal|severe > -1.630612 2.563055 > > Residual Deviance: 503.7251 > AIC: 509.7251 > > ------ > > Can someone explain what I am doing wrong when using vglm and polr with > the modified dataset? I do not understand why these two formulations > should give different results. > > Brant Inman > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Inman, Brant A. M.D.
2007-Jan-07 16:27 UTC
[R] Using VGAM's vglm function for ordinal logistic regression
Thank you for the help. Indeed, the differences in the results that I noted were due to the incorrect ordering of the response variable that resulted from my unattentive use of as.ordered on a character vector. Brant -----Original Message----- From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Sent: Sunday, January 07, 2007 2:52 AM To: Inman, Brant A. M.D. Cc: r-help at stat.math.ethz.ch; yee at stat.auckland.ac.nz Subject: Re: [R] Using VGAM's vglm function for ordinal logistic regression On Sat, 6 Jan 2007, Inman, Brant A. M.D. wrote:> > R-Experts: > > I am using the vglm function of the VGAM library to performproportional> odds ordinal logistic regression. The issue that I would like helpwith> concerns the format in which the response variable must be providedfor> this function to work correctly.> pneumo2$severity[1] normal normal normal normal normal normal normal normal mild mild [11] mild mild mild mild mild mild severe severe severe severe [21] severe severe severe severe Levels: mild < normal < severe is different from the ordering in the first example. The difference between vglm and polr is that the latter uses the conventional sign for the coefficients: see MASS4 p.204. I would never use as.ordered on a character vector, as this leaves it to R to choose the ordering. (Even if you think you intended alphabetical order, that depends on the locale: see the warnings on the help page.)> Consider the following example: > > ------ > > library(VGAM) > library(MASS) > > attach(pneumo) > pneumo # Inspect the format of the original dataset > > # Restructure the pneumo dataset into a different format > pneumo2 <- data.frame(matrix(ncol=3, nrow=24)) > colnames(pneumo2) <- c('exposure.time', 'severity', 'freq') > pneumo2[,1] <- rep(pneumo[,1],3) > pneumo2[,2] <- > as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8))) > pneumo2[1:8,3] <- pneumo[,2] > pneumo2[9:16,3] <- pneumo[,3] > pneumo2[17:24,3] <- pneumo[,4] > pneumo2 # Inspect the format of the new modified dataset > > ------ > > The problem occurs when I try to analyze these two datasets, which are > identical in content, with the proportional odds model using vglm: > > ------ > > # Analyze the original dataset with vglm, get one set of results > >> vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time), > data=pneumo, > + family=cumulative(parallel=T)) > > Coefficients: > (Intercept):1 (Intercept):2 log(exposure.time) > 9.676093 10.581725 -2.596807 > > Degrees of Freedom: 16 Total; 13 Residual > Residual Deviance: 5.026826 > Log-likelihood: -204.2742 > > # Analyzing the modified dataset with vglm gives another set ofresults> >> vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2, > + family=cumulative(parallel=T)) > > Coefficients: > (Intercept):1 (Intercept):2 log(exposure.time) > -1.6306499 2.5630170 -0.1937956 > > Degrees of Freedom: 48 Total; 45 Residual > Residual Deviance: 503.7251 > Log-likelihood: -251.8626 > Warning messages: > 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace > trace, wzeps = control$wzepsilon) > > # Analyzing the modified dataset with polr reproduces these second > results > >> polr(severity ~ log(exposure.time), weights=freq, data=pneumo2) > > Coefficients: > log(exposure.time) > 0.1938154 > > Intercepts: > mild|normal normal|severe > -1.630612 2.563055 > > Residual Deviance: 503.7251 > AIC: 509.7251 > > ------ > > Can someone explain what I am doing wrong when using vglm and polrwith> the modified dataset? I do not understand why these two formulations > should give different results. > > Brant Inman > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595