I sympathize with your trouble bringing in data, but you need to catch your breath and figure out what you really have. I think when you get a bit more R practice, you will be able to manage what you bring in without going back to that editor so much. I feel certain your data is not what you think it is. Here's an example where a factor DOES work on the lhs of a glm: > y <- factor(c("S","N","S","N","S","N","S","N")) > x <- rnorm(8) > glm(y~x,family=binomial(link=logit)) Look here: the system knows y is a factor: > attributes(y) $levels [1] "N" "S" $class [1] "factor" My guess is that your variables are not really factors, but rather character vectors. You have to convert them into factors. Watch the error I get is the same that you got. > y <- c("S","N","S","N","S","N","S","N") > glm(y~x,family=binomial(link=logit)) Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid variable type Note the system doesn't know y is "supposed" to be a factor. It just sees characters. > y [1] "S" "N" "S" "N" "S" "N" "S" "N" > levels(y) NULL > attributes(y) NULL but look: > glm(as.factor(y)~x,family=binomial(link=logit)) arinbasu at softhome.net wrote:> Hi All: > I came across the following problem while working with a dataset, and > wondered if there could be a solution I sought here. > > My dataset consists of information on 402 individuals with the > followng five variables (age,sex, status = a binary variable with > levels "case" or "control", mma, dma). > During data check, I found that in the raw data, the data entry > operator had mistakenly put a "0" for one participant, so now, the > levels show > >> levels(status) > > [1] "0" "control" "case" > The variables mma, and dma are actually numerical variables but in the > dataframe, they are represented as "characters". I tried to change the > type of the variables (from character to numeric) using the edit > function (and bringing up the data grid where then I made changes), > but the changes were not saved. I tried > mma1 <- as.numeric(mma) > but I was not successful in converting mma from a character variable > to a numeric variable. > So, to edit and "clean" the data, I exported the dataset as a text > file to Epi Info 2002 (version 2, Windows). I used the following code: > mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma)) > write.table(mysubset, file="mysubset.txt", sep="\t", col.names=NA) > After I made changes in the variables using Epi Info (I created a new > variable called "statusrec" containing values "case" and "control"), I > exported the file as a ".rec" file (filename "mydata.rec"). I used the > following code to read the file in R: > require(foreign) > myData <- read.epiinfo("mydata.rec", read.deleted=NA) > Now, the problem is this, when I want to run a logistic regression, R > returns the following error message: > >> glm(statusrec~mma, family=binomial(link=logit)) > > Error in model.frame(formula, rownames, variables, varnames, extras, > extranames, : > invalid variable type > > I cannot figure out the solution. I want to run a logistic regression > now with the variable statusrec (which is a binary variable containing > values "case" and "control"), and another > variable (say mma, which is now a numeric variable). What does the > above error message mean and what could be a possible solution? > Would greatly appreciate your insights and wisdom. > -Arin Basu > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help-- Paul E. Johnson email: pauljohn at ukans.edu Dept. of Political Science http://lark.cc.ukans.edu/~pauljohn University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66045 FAX: (785) 864-5700
Hi All: I came across the following problem while working with a dataset, and wondered if there could be a solution I sought here. My dataset consists of information on 402 individuals with the followng five variables (age,sex, status = a binary variable with levels "case" or "control", mma, dma). During data check, I found that in the raw data, the data entry operator had mistakenly put a "0" for one participant, so now, the levels show> levels(status)[1] "0" "control" "case" The variables mma, and dma are actually numerical variables but in the dataframe, they are represented as "characters". I tried to change the type of the variables (from character to numeric) using the edit function (and bringing up the data grid where then I made changes), but the changes were not saved. I tried mma1 <- as.numeric(mma) but I was not successful in converting mma from a character variable to a numeric variable. So, to edit and "clean" the data, I exported the dataset as a text file to Epi Info 2002 (version 2, Windows). I used the following code: mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma)) write.table(mysubset, file="mysubset.txt", sep="\t", col.names=NA) After I made changes in the variables using Epi Info (I created a new variable called "statusrec" containing values "case" and "control"), I exported the file as a ".rec" file (filename "mydata.rec"). I used the following code to read the file in R: require(foreign) myData <- read.epiinfo("mydata.rec", read.deleted=NA) Now, the problem is this, when I want to run a logistic regression, R returns the following error message:> glm(statusrec~mma, family=binomial(link=logit))Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid variable type I cannot figure out the solution. I want to run a logistic regression now with the variable statusrec (which is a binary variable containing values "case" and "control"), and another variable (say mma, which is now a numeric variable). What does the above error message mean and what could be a possible solution? Would greatly appreciate your insights and wisdom. -Arin Basu
> The variables mma, and dma are actually numerical variables but in the > dataframe, they are represented as "characters". I tried to change thetype> of the variables (from character to numeric) using the edit function (and > bringing up the data grid where then I made changes), but the changes were > not saved. I tried > > mma1 <- as.numeric(mma)i'm not sure understanding your problem correct, but is it possible that you forget the data.frame ,suppose your data.frame is df df$mma <- as.numeric (mma) should work df$mma[df$mma == 0 ] <- 1 #"or any other value" regards,christian
The message probably means that the variable is a character variable and not numerical (as you intended) nor factor. Although you said there was a trip to epiinfo, you never said where the data came from. Try dumping out the data, editing the file, and reading it with read.table. There are other ways, but one of your steps has a bug and we have no idea what the steps actually were. When you are finished, try sapply(mfdf, class) on your dataframe `mydf'. You should see only numeric or factor variables. On Sun, 14 Dec 2003 arinbasu at softhome.net wrote:> Hi All: > > I came across the following problem while working with a dataset, and > wondered if there could be a solution I sought here. > > > My dataset consists of information on 402 individuals with the followng five > variables (age,sex, status = a binary variable with levels "case" or > "control", mma, dma). > > During data check, I found that in the raw data, the data entry operator had > mistakenly put a "0" for one participant, so now, the levels show > > > levels(status) > [1] "0" "control" "case" > > The variables mma, and dma are actually numerical variables but in the > dataframe, they are represented as "characters". I tried to change the type > of the variables (from character to numeric) using the edit function (and > bringing up the data grid where then I made changes), but the changes were > not saved. I tried > > mma1 <- as.numeric(mma) > > but I was not successful in converting mma from a character variable to a > numeric variable. > > So, to edit and "clean" the data, I exported the dataset as a text file to > Epi Info 2002 (version 2, Windows). I used the following code: > > mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma)) > write.table(mysubset, file="mysubset.txt", sep="\t", col.names=NA) > > After I made changes in the variables using Epi Info (I created a new > variable called "statusrec" containing values "case" and "control"), I > exported the file as a ".rec" file (filename "mydata.rec"). I used the > following code to read the file in R: > > require(foreign) > myData <- read.epiinfo("mydata.rec", read.deleted=NA) > > Now, the problem is this, when I want to run a logistic regression, R > returns the following error message: > > > glm(statusrec~mma, family=binomial(link=logit)) > Error in model.frame(formula, rownames, variables, varnames, extras, > extranames, : > invalid variable type > > > I cannot figure out the solution. I want to run a logistic regression now > with the variable statusrec (which is a binary variable containing values > "case" and "control"), and another > variable (say mma, which is now a numeric variable). What does the above > error message mean and what could be a possible solution? > > Would greatly appreciate your insights and wisdom. > > -Arin Basu > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595