thr3ads.net - R help - [R] Studdy Missing Data, differentiate between a percent with in the valid answers and with in the different missing answers [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Ericka Lundström

2008-Mar-03 07:21 UTC

[R] Studdy Missing Data, differentiate between a percent with in the valid answers and with in the different missing answers

Hi R experts

I'm trying to emigrate from SPSS to R, thou I have some problems whit  
getting R to distinguish between the different kind of missing.

I want to distinguish between data that are missing because a  
respondent refused to answer and data that are missing because the  
question didn't apply to that respondent. In other words I wante to  
create data values where I control what are valid and what are  
missing observations s? I can study both the valid and the missing  
observations.

SPSS dos this in a quite smooth way, look something like this in SPSS:

Get paid appropriately, considering efforts and achievements

N	Valid	947	
	Missing	558
									   		     Valid	    Cumulative
						Frequency	Percent	   Percent	 Percent
Valid	Agree strongly	  	  98		     6,5		10,3	  10,3
		Agree				408		   27,1		43,1	  53,4
		Neither agree
		nor disagree			126		     8,4		13,3	  66,7
		Disagree			259		   17,2		27,3	  94,1
		Disagree strongly 	   56		      3,7		  5,9	        100,0
		Total				947		    62,9	      100,0	
Missing	
		Not applicable		534		    35,5		
		Don't know			     1		        ,1		
		No answer			  23		      1,5		
		Total				558	    	    37,1		
Total				     	      1505		  100,0

(If the table get messy and you can?t read it in your email program  
there is a nice formatted SPSS table here https://stat.ethz.ch/ 
pipermail/r-help/1998-October/002942.html whare K. Mueller ask a  
almost similar question in 1998!)

SPSS is metacategorizing or recognizing if my variables are Missing  
or Valid. This means that, besides differentiating between missing  
and valid, the categories within missing are treated separately.

# At the moment I'm only able to get this information from R:
 > describe(ess3dk$PDAPRP)
ess3dk$PDAPRP : Get paid appropriately, considering efforts and  
achievements
       n missing  unique
    1505       0       8

Agree strongly (98, 7%),
Agree (408, 27%)
Neither agree nor disagree (126, 8%),
Disagree (259, 17%)
Disagree strongly (56, 4%),
Not applicable (534, 35%)
Don't know (1, 0%),
No answer (23, 2%)

# Then I can recode 'Not applicable', 'Don't know' and
'No answer' as
missing:
 > ess3dk[ess3dk$PDAPRP=="Not applicable" |
ess3dk$PDAPRP=="Don't
know" | ess3dk$PDAPRP=="No answer","PDAPRP"] <- NA

# But that just pile 'Not applicable', 'Don't know' and
'No answer'
together in ?missing?:
 > describe(ess3dk$PDAPRP)
ess3dk$PDAPRP : Get paid appropriately, considering efforts and  
achievements
       n missing  unique
     947     558       5

Agree strongly (98, 10%),
Agree (408, 43%)
Neither agree nor disagree (126, 13%),
Disagree (259, 27%)
Disagree strongly (56, 6%)

Is there a smart way in R to differentiate between missing and valid  
and at the same time treat both the categories within missing and  
valid as answers (like SPSS did above)?

I'm using a SPSS data set (.sav/.por) from The European Social Survey  
(the ESS) http://ess.nsd.uib.no/index.jsp? 
module=download&year=2007&country=&download=%5CDirect+Data+download%
5C2007%5C01%23ESS3+-+integrated+file%2C+edition+2.0%5C.% 
5CESS3e02.spss.zip which I import via the spss.get like this:
 > ess3dk<- spss.get("filename.sav", lowernames=FALSE, datevars
=
NULL, use.value.labels = TRUE, to.data.frame = TRUE, max.value.labels  
= Inf, force.single=TRUE, allow=NULL, charfactor=FALSE)

I have read the help file in spss.get and read.spss to see it this  
subject was mentioned and I have looked around this malinglist. I  
have found one question that is almost similar, here https:// 
stat.ethz.ch/pipermail/r-help/1998-October/002942.html (from October  
1998!) but there is no one answer anywhere.

Here are some self contained reproducible code:

dataFrame <- data.frame(ONE = c(2, 1, 3, 2, NA, 4, 2), TWO =
c("yes",
"?", "No", "X", "No", "?",
"X"), AGE = c(42, 18, 49, 62,NA, 19, 82))
# I create a simpel dataframe

describe(dataFrame$TWO) # then I have a look at the ?TWO?-column.  
Here I can see every answer.

dataFrame[dataFrame$TWO== "?" | dataFrame$TWO== "X",
"TWO" ] <- NA #
Now i classify the answers "X" and "?" as missing, bacause I
want to
know the valid percent (yes and no) but I don?t want to delete the  
"X" and the ??? answers.

describe(dataFrame$TWO) # then I have a another look at the ?TWO?- 
column. Now I can't see how many answered "X" and how many
answered "?"

# my question is if it's possible in R to work whit a metacategory of  
valid and not valid answers, as described above. In other words I  
want to, as possible in SPSS, distinguish between a percent with in  
the valid answers and a percent over all.

I normally use this method to quickly get an overview of missing and  
valid answers and the internal percentile distribution within the  
missing and valid answers, so I would like to find some smart  
solution to this problem. I would really appreciate a answer or some  
help to get my in the right direction.

Thanks in advance

Regards

Ericka Lujndstr?m

James Reilly

2008-Mar-03 09:02 UTC

head link

[R] Studdy Missing Data, differentiate between a percent with in the valid answers and with in the different missing answers

On 3/3/08 8:21 PM, Ericka Lundstr?m wrote:
 > I'm trying to emigrate from SPSS to R, thou I have some problems whit
 > getting R to distinguish between the different kind of missing.
...
 > Is there a smart way in R to differentiate between missing and valid
 > and at the same time treat both the categories within missing and
 > valid as answers (like SPSS did above)


The Hmisc package has some support for special missing values, for 
instance when reading in SAS datasets using sas.get. I don't believe 
spss.get offers the same facility, though.

You can define special missing values for a variable manually, which 
might seem a bit involved, but this could easily be automated. For your 
example, try:

special <- dataFrame$TWO %in% c("?","X")
attr(dataFrame$TWO, "special.miss") <-
     list(codes=as.character(dataFrame$TWO[special]),
     obs=(1:length(dataFrame$TWO))[special])
class(dataFrame$TWO) <- c("factor", "special.miss")
is.na(dataFrame$TWO) <- special

# Then describe gives new percentages

describe(dataFrame$TWO)
dataFrame$TWO
       n missing       ?       X  unique
       3       4       2       2       2

No (2, 67%), yes (1, 33%)

HTH,
James
-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Mar 2008 - Studdy Missing Data, differentiate between a percent with in the valid answers and with in the different missing answers

[R] Studdy Missing Data, differentiate between a percent with in the valid answers and with in the different missing answers

[R] Studdy Missing Data, differentiate between a percent with in the valid answers and with in the different missing answers

Apparently Analagous Threads