thr3ads.net - R help - [R] Problem with data conversion [Dec 2003]

If this information is useful, please help other people find it:
Share via:

Paul E. Johnson

2003-Dec-14 09:29 UTC

[R] Problem with data conversion

I sympathize with your trouble bringing in data, but you need to catch 
your breath and figure out what you really have.  I think when you get a 
bit more R practice, you will be able to manage what you bring in 
without going back to that editor so much.

I feel certain your data is not what you think it is.  Here's an example 
where a factor DOES work on the lhs of a glm:

 > y <-
factor(c("S","N","S","N","S","N","S","N"))
 > x <- rnorm(8)
 > glm(y~x,family=binomial(link=logit))

Look here: the system knows y is a factor:
 > attributes(y)
$levels
[1] "N" "S"

$class
[1] "factor"

My guess is that your variables are not really factors, but rather 
character vectors.  You have to convert them into factors.
Watch the error I get is the same that you got.

 > y <-
c("S","N","S","N","S","N","S","N")
 > glm(y~x,family=binomial(link=logit))
Error in model.frame(formula, rownames, variables, varnames, extras, 
extranames,  :
        invalid variable type

Note the system doesn't know y is "supposed" to be a factor. It
just
sees characters.

 > y
[1] "S" "N" "S" "N" "S"
"N" "S" "N"
 > levels(y)
NULL
 > attributes(y)
NULL

but look:
 > glm(as.factor(y)~x,family=binomial(link=logit))



arinbasu at softhome.net wrote:
> Hi All:
> I came across the following problem while working with a dataset, and 
> wondered if there could be a solution I sought here.
>
> My dataset consists of information on 402 individuals with the 
> followng five variables (age,sex, status = a binary variable with 
> levels "case" or "control", mma, dma).
> During data check, I found that in the raw data, the data entry 
> operator had mistakenly put a "0" for one participant, so now,
the
> levels show
>
>> levels(status) 
>
> [1] "0" "control" "case"
> The variables mma, and dma are actually numerical variables but in the 
> dataframe, they are represented as "characters". I tried to
change the
> type of the variables (from character to numeric) using the edit 
> function (and bringing up the data grid where then I made changes), 
> but the changes were not saved. I tried
> mma1 <- as.numeric(mma)
> but I was not successful in converting mma from a character variable 
> to a numeric variable.
> So, to edit and "clean" the data, I exported the dataset as a
text
> file to Epi Info 2002 (version 2, Windows). I used the following code:
> mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma))
> write.table(mysubset, file="mysubset.txt", sep="\t",
col.names=NA)
> After I made changes in the variables using Epi Info (I created a new 
> variable called "statusrec" containing values "case"
and "control"), I
> exported the file as a ".rec" file (filename
"mydata.rec"). I used the
> following code to read the file in R:
> require(foreign)
> myData <- read.epiinfo("mydata.rec", read.deleted=NA)
> Now, the problem is this, when I want to run a logistic regression, R 
> returns the following error message:
>
>> glm(statusrec~mma, family=binomial(link=logit))
>
> Error in model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
>       invalid variable type
>
> I cannot figure out the solution. I want to run a logistic regression 
> now with the variable statusrec (which is a binary variable containing 
> values "case" and "control"), and another
> variable (say mma, which is now a numeric variable). What does the 
> above error message mean and what could be a possible solution?
> Would greatly appreciate your insights and wisdom.
> -Arin Basu
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help


-- 
Paul E. Johnson                       email: pauljohn at ukans.edu
Dept. of Political Science            http://lark.cc.ukans.edu/~pauljohn
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66045                FAX: (785) 864-5700

arinbasu@softhome.net

2003-Dec-14 12:19 UTC

head link

[R] Problem with data conversion

Hi All: 

I came across the following problem while working with a dataset, and 
wondered if there could be a solution I sought here. 


My dataset consists of information on 402 individuals with the followng five 
variables (age,sex, status = a binary variable with levels "case" or 
"control", mma, dma). 

During data check, I found that in the raw data, the data entry operator had 
mistakenly put a "0" for one participant, so now, the levels show 
> levels(status) [1] "0" "control" "case" 

The variables mma, and dma are actually numerical variables but in the 
dataframe, they are represented as "characters". I tried to change the
type
of the variables (from character to numeric) using the edit function (and 
bringing up the data grid where then I made changes), but the changes were 
not saved. I tried 

mma1 <- as.numeric(mma) 

but I was not successful in converting mma from a character variable to a 
numeric variable. 

So, to edit and "clean" the data, I exported the dataset as a text
file to
Epi Info 2002 (version 2, Windows). I used the following code: 

mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma))
write.table(mysubset, file="mysubset.txt", sep="\t",
col.names=NA)

After I made changes in the variables using Epi Info (I created a new 
variable called "statusrec" containing values "case" and
"control"), I
exported the file as a ".rec" file (filename "mydata.rec").
I used the
following code to read the file in R: 

require(foreign)
myData <- read.epiinfo("mydata.rec", read.deleted=NA) 

Now, the problem is this, when I want to run a logistic regression, R 
returns the following error message: 
> glm(statusrec~mma, family=binomial(link=logit))Error in model.frame(formula, rownames, variables, varnames, extras, 
extranames,  :
       invalid variable type 


I cannot figure out the solution. I want to run a logistic regression now 
with the variable statusrec (which is a binary variable containing values 
"case" and "control"), and another
variable (say mma, which is now a numeric variable). What does the above 
error message mean and what could be a possible solution? 

Would greatly appreciate your insights and wisdom. 

 -Arin Basu

Christian Schulz

2003-Dec-14 13:23 UTC

head link

[R] Problem with data conversion

> The variables mma, and dma are actually numerical variables but in the
> dataframe, they are represented as "characters". I tried to
change the
type> of the variables (from character to numeric) using the edit function (and
> bringing up the data grid where then I made changes), but the changes were
> not saved. I tried
>
> mma1 <- as.numeric(mma)
i'm not sure understanding your problem correct, but is it possible that you
forget the data.frame ,suppose your data.frame is df

df$mma <- as.numeric (mma)  should work
df$mma[df$mma == 0 ]  <-  1   #"or any other value"

regards,christian

Prof Brian Ripley

2003-Dec-14 13:29 UTC

head link

[R] Problem with data conversion

The message probably means that the variable is a character variable and 
not numerical (as you intended) nor factor.

Although you said there was a trip to epiinfo, you never said where the 
data came from.  Try dumping out the data, editing the file, and reading 
it with read.table.  There are other ways, but one of your steps has a bug 
and we have no idea what the steps actually were.

When you are finished, try

sapply(mfdf, class)

on your dataframe `mydf'.  You should see only numeric or factor 
variables.

On Sun, 14 Dec 2003 arinbasu at softhome.net wrote:
> Hi All: 
> 
> I came across the following problem while working with a dataset, and 
> wondered if there could be a solution I sought here. 
> 
> 
> My dataset consists of information on 402 individuals with the followng
five
> variables (age,sex, status = a binary variable with levels "case"
or
> "control", mma, dma). 
> 
> During data check, I found that in the raw data, the data entry operator
had
> mistakenly put a "0" for one participant, so now, the levels show
> 
> > levels(status) 
> [1] "0" "control" "case" 
> 
> The variables mma, and dma are actually numerical variables but in the 
> dataframe, they are represented as "characters". I tried to
change the type
> of the variables (from character to numeric) using the edit function (and 
> bringing up the data grid where then I made changes), but the changes were 
> not saved. I tried 
> 
> mma1 <- as.numeric(mma) 
> 
> but I was not successful in converting mma from a character variable to a 
> numeric variable. 
> 
> So, to edit and "clean" the data, I exported the dataset as a
text file to
> Epi Info 2002 (version 2, Windows). I used the following code: 
> 
> mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma))
> write.table(mysubset, file="mysubset.txt", sep="\t",
col.names=NA)
> 
> After I made changes in the variables using Epi Info (I created a new 
> variable called "statusrec" containing values "case"
and "control"), I
> exported the file as a ".rec" file (filename
"mydata.rec"). I used the
> following code to read the file in R: 
> 
> require(foreign)
> myData <- read.epiinfo("mydata.rec", read.deleted=NA) 
> 
> Now, the problem is this, when I want to run a logistic regression, R 
> returns the following error message: 
> 
> > glm(statusrec~mma, family=binomial(link=logit))
> Error in model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
>        invalid variable type 
> 
> 
> I cannot figure out the solution. I want to run a logistic regression now 
> with the variable statusrec (which is a binary variable containing values 
> "case" and "control"), and another
> variable (say mma, which is now a numeric variable). What does the above 
> error message mean and what could be a possible solution? 
> 
> Would greatly appreciate your insights and wisdom. 
> 
>  -Arin Basu
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> 
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Seemingly Similar Threads

Search for more reasonably related threads

R help - Dec 2003 - Problem with data conversion

[R] Problem with data conversion

[R] Problem with data conversion

[R] Problem with data conversion

[R] Problem with data conversion

Seemingly Similar Threads