thr3ads.net - R help - [R] Recomendation on data management problem [Jul 2012]

If this information is useful, please help other people find it:
Share via:

jose Bartolomei

2012-Jul-20 19:44 UTC

[R] Recomendation on data management problem

Dear R users,



I am dealing with a data
set of aprox. 5 millions rows with data inconsistencies.



The data.frame is an
observation per claim with approximately 2 M unique ID's



Furthermore, one
individual could have one or more claims.



I have found that an
individual could have all his/her information in some but not all
claims as 

example 1



Id: 1
gender birthdate2
F 		1994-01-28
<NA>
F 		1994-01-28
F 		1994-01-28
F 		1994-01-28
F 		1994-01-28

or it could have or his/her information but it
appears there was a data entry mistake as example 2 in the last row
of the gender column.



id:
2
gender birthdate2
F 		2008-07-02
F 		2008-07-02
F 		2008-07-02
F 		2008-07-02
F 		2008-07-02
M 		2008-07-02






Those are two example of
mixed situation that I have found.



I will like to fill the
missing information (example 1) or correct the information (example
2) by id.



I do not want to impute
here, that will come later for those real missing.



Which would be your
recommendation in working with this type of data management problem?



Thanks in advance,



Jose
 		 	   		  
	[[alternative HTML version deleted]]

R help - Jul 2012 - Recomendation on data management problem

[R] Recomendation on data management problem