Michael Friendly
2012-Apr-16 17:43 UTC
[R] Problems with subset, droplevels and lm: variable lengths differ
[Env: R 2.14.2 / Win Xp] In the script below, I want to select some variables from rrcov::OsloTransect, delete cases with any missing data, and subset the data frame Oslo to remove cases for two levels of the factor litho that occur with low frequency. The checks I run on my new data frame Oslo look OK, but I when I try to fit a multivariate linear model with lm(), I am getting an error: variable lengths differ (found for 'litho'). How can I fix this? > data(OsloTransect, package="rrcov") > # keep a subset of variables & rename some variables > Oslo <-OsloTransect[c("X.ID", "XCOO", "YCOO", "X.FOREST", "X.WEATHER", "X.FLITHO", "ALT")] > colnames(Oslo) <- c("site", "XC", "YC", "forest", "weather", "litho", "altitude") > Oslo <- cbind(Oslo, OsloTransect[,c("Cu", "Fe", "K", "Mg", "Mn", "P", "Zn")]) > # make site a factor > Oslo[,"site"] <- factor(Oslo[,"site"]) > > # log transform the chemical elements > Oslo[,8:14] <- log(Oslo[,8:14]) > > # delete cases with missing data > Oslo <- Oslo[complete.cases(Oslo),] > nrow(Oslo) [1] 350 > > # delete low frequency litho=="GNEID_O" | "MICSH" > Oslo <- subset(Oslo, !litho %in% c("GNEID_O", "MICSH"), drop=TRUE) > nrow(Oslo) [1] 332 > Oslo<- droplevels(Oslo) > table(Oslo$litho) CAMSED GNEIS_O GNEIS_R MAGM 98 89 32 113 > nrow(Oslo) [1] 332 > mod1 <- lm(cbind("Cu", "Fe", "K", "Mg", "Mn", "P", "Zn") ~ litho + forest + weather, data=Oslo) Error in model.frame.default(formula = cbind("Cu", "Fe", "K", "Mg", "Mn", : variable lengths differ (found for 'litho') > -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
Joshua Wiley
2012-Apr-16 17:54 UTC
[R] Problems with subset, droplevels and lm: variable lengths differ
Hi Michael, Just a silly error, it should be: mod1 <- lm(cbind(Cu, Fe, K, Mg, Mn, P, Zn) ~ litho + forest + weather, data=Oslo) quoted you get a 1 x 7 character matrix you are trying to regress on. Cheers, Josh On Mon, Apr 16, 2012 at 10:43 AM, Michael Friendly <friendly at yorku.ca> wrote:> [Env: ?R 2.14.2 / Win Xp] > > In the script below, I want to select some variables from > rrcov::OsloTransect, delete cases with > any missing data, and subset the data frame Oslo to remove cases for two > levels of the > factor litho that occur with low frequency. > > The checks I run on my new data frame Oslo look OK, but I when I try to fit > a multivariate > linear model with lm(), I am getting an error: variable lengths differ > (found for 'litho'). > How can I fix this? > >> data(OsloTransect, package="rrcov") >> # keep a subset of variables & rename some variables >> Oslo <-OsloTransect[c("X.ID", "XCOO", "YCOO", "X.FOREST", "X.WEATHER", >> "X.FLITHO", "ALT")] >> colnames(Oslo) <- c("site", "XC", "YC", "forest", "weather", "litho", >> "altitude") >> Oslo <- cbind(Oslo, OsloTransect[,c("Cu", "Fe", "K", "Mg", "Mn", "P", >> "Zn")]) >> # make site a factor >> Oslo[,"site"] <- factor(Oslo[,"site"]) >> >> # log transform the chemical elements >> Oslo[,8:14] <- log(Oslo[,8:14]) >> >> # delete cases with missing data >> Oslo <- Oslo[complete.cases(Oslo),] >> nrow(Oslo) > [1] 350 >> >> # delete low frequency litho=="GNEID_O" | "MICSH" >> Oslo <- subset(Oslo, !litho %in% c("GNEID_O", "MICSH"), drop=TRUE) >> nrow(Oslo) > [1] 332 >> Oslo<- droplevels(Oslo) >> table(Oslo$litho) > > ?CAMSED GNEIS_O GNEIS_R ? ?MAGM > ? ? 98 ? ? ?89 ? ? ?32 ? ? 113 >> nrow(Oslo) > [1] 332 >> mod1 <- lm(cbind("Cu", "Fe", "K", "Mg", "Mn", "P", "Zn") ~ litho + forest >> + weather, data=Oslo) > Error in model.frame.default(formula = cbind("Cu", "Fe", "K", "Mg", "Mn", ?: > ?variable lengths differ (found for 'litho') >> > > > > -- > Michael Friendly ? ? Email: friendly AT yorku DOT ca > Professor, Psychology Dept. > York University ? ? ?Voice: 416 736-5115 x66249 Fax: 416 736-5814 > 4700 Keele Street ? ?Web: ? http://www.datavis.ca > Toronto, ONT ?M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
Peter Ehlers
2012-Apr-16 18:21 UTC
[R] Problems with subset, droplevels and lm: variable lengths differ
On 2012-04-16 10:43, Michael Friendly wrote:> [Env: R 2.14.2 / Win Xp] > > In the script below, I want to select some variables from > rrcov::OsloTransect, delete cases with > any missing data, and subset the data frame Oslo to remove cases for two > levels of the > factor litho that occur with low frequency. > > The checks I run on my new data frame Oslo look OK, but I when I try to > fit a multivariate > linear model with lm(), I am getting an error: variable lengths differ > (found for 'litho'). > How can I fix this? > > > data(OsloTransect, package="rrcov") > > # keep a subset of variables& rename some variables > > Oslo<-OsloTransect[c("X.ID", "XCOO", "YCOO", "X.FOREST", > "X.WEATHER", "X.FLITHO", "ALT")] > > colnames(Oslo)<- c("site", "XC", "YC", "forest", "weather", "litho", > "altitude") > > Oslo<- cbind(Oslo, OsloTransect[,c("Cu", "Fe", "K", "Mg", "Mn", "P", > "Zn")]) > > # make site a factor > > Oslo[,"site"]<- factor(Oslo[,"site"]) > > > > # log transform the chemical elements > > Oslo[,8:14]<- log(Oslo[,8:14]) > > > > # delete cases with missing data > > Oslo<- Oslo[complete.cases(Oslo),] > > nrow(Oslo) > [1] 350 > > > > # delete low frequency litho=="GNEID_O" | "MICSH" > > Oslo<- subset(Oslo, !litho %in% c("GNEID_O", "MICSH"), drop=TRUE) > > nrow(Oslo) > [1] 332 > > Oslo<- droplevels(Oslo) > > table(Oslo$litho) > > CAMSED GNEIS_O GNEIS_R MAGM > 98 89 32 113 > > nrow(Oslo) > [1] 332 > > mod1<- lm(cbind("Cu", "Fe", "K", "Mg", "Mn", "P", "Zn") ~ litho + > forest + weather, data=Oslo) > Error in model.frame.default(formula = cbind("Cu", "Fe", "K", "Mg", > "Mn", : > variable lengths differ (found for 'litho') > > >Michael, Unless I'm missing something, don't you just have to drop the quotes in your cbind()? Peter Ehlers