The R documentation for some of the foreign package's functions says that the set of variable labels becomes attributes in the resulting data frame. Thus, e.g., 5="strongly agree", 4="agree", etc. I'm happy that the labels are being passed, but unfortunately, when R summarizes the data, it will list it only as categories, and doesn't deal with the corresponding numbers. It seems as though the numbers attached to the categories don't exist. Is there a way to make R go back and forth between the categories and the corresponding numbers as Stata does, or do I just have to set convert.factors=FALSE ? Hope everyone's enjoying the April snow! Thanks, Janet> MC<-read.dta("C:/Documents and Settings/janet/Desktop/poleff/mexchn_gary.dta") > summary(MC)id country code sex Min. :10100001 Length:1068 Mexico:604 Female:541 1st Qu.:10100306 Mode :character China :464 Male :509 Median :14000071 NA's : 18 Mean :12305905 3rd Qu.:14000339 Max. :14000628> mean(MC$id)[1] 12305905> mean(MC$sex)[1] NA Warning message: argument is not numeric or logical: returning NA in: mean.default(MC$sex) Stata gives: . summ Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- id | 1068 1.23e+07 1934101 1.01e+07 1.40e+07 country | 0 code | 1068 .4344569 .4959177 0 1 sex | 1050 1.484762 .5000059 1 2
> tst.df <- data.frame(a=letters[1:3], b=factor(c(3:5)))> tst.df a b 1 a 3 2 b 4 3 c 5 > as.numeric(tst.df$a) [1] 1 2 3 > as.numeric(tst.df$b) [1] 1 2 3 > as.character(tst.df$b) [1] "3" "4" "5" > as.numeric(as.character(tst.df$b)) [1] 3 4 5 Does this answer your question? Spencer Graves janet rosenbaum wrote:> The R documentation for some of the foreign package's functions says > that the set of variable labels becomes attributes in the resulting > data frame. > > Thus, e.g., 5="strongly agree", 4="agree", etc. > > I'm happy that the labels are being passed, but unfortunately, when > R summarizes the data, it will list it only as categories, and > doesn't deal with the corresponding numbers. It seems as though > the numbers attached to the categories don't exist. > > Is there a way to make R go back and forth between the categories and > the corresponding numbers as Stata does, or do I just have to set > convert.factors=FALSE ? > > Hope everyone's enjoying the April snow! > Thanks, > > Janet > > >>MC<-read.dta("C:/Documents and Settings/janet/Desktop/poleff/mexchn_gary.dta") >>summary(MC) > > id country code sex > Min. :10100001 Length:1068 Mexico:604 Female:541 > 1st Qu.:10100306 Mode :character China :464 Male :509 > Median :14000071 NA's : 18 > Mean :12305905 > 3rd Qu.:14000339 > Max. :14000628 > >>mean(MC$id) > > [1] 12305905 > >>mean(MC$sex) > > [1] NA > Warning message: > argument is not numeric or logical: returning NA in: mean.default(MC$sex) > > > > Stata gives: > > . summ > > Variable | Obs Mean Std. Dev. Min Max > -------------+----------------------------------------------------- > id | 1068 1.23e+07 1934101 1.01e+07 1.40e+07 > country | 0 > code | 1068 .4344569 .4959177 0 1 > sex | 1050 1.484762 .5000059 1 2 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
On Tue, 8 Apr 2003, janet rosenbaum wrote:> > The R documentation for some of the foreign package's functions says > that the set of variable labels becomes attributes in the resulting > data frame. > > Thus, e.g., 5="strongly agree", 4="agree", etc. > > I'm happy that the labels are being passed, but unfortunately, when > R summarizes the data, it will list it only as categories, and > doesn't deal with the corresponding numbers. It seems as though > the numbers attached to the categories don't exist. > > Is there a way to make R go back and forth between the categories and > the corresponding numbers as Stata does, or do I just have to set > convert.factors=FALSE ?In this particular case I don't see why you would want the numbers, but the function as.numeric() will extract the underlying numbers from a factor. eg mean(as.numeric(MC$sex)) or mean(as.numeric(MC$code)) should work, but mean(MC$sex=="Male") or mean(MC$code=="China") should also work and seem clearer to me. -thomas