Dear R users, I am dealing with a data set of aprox. 5 millions rows with data inconsistencies. The data.frame is an observation per claim with approximately 2 M unique ID's Furthermore, one individual could have one or more claims. I have found that an individual could have all his/her information in some but not all claims as example 1 Id: 1 gender birthdate2 F 1994-01-28 <NA> F 1994-01-28 F 1994-01-28 F 1994-01-28 F 1994-01-28 or it could have or his/her information but it appears there was a data entry mistake as example 2 in the last row of the gender column. id: 2 gender birthdate2 F 2008-07-02 F 2008-07-02 F 2008-07-02 F 2008-07-02 F 2008-07-02 M 2008-07-02 Those are two example of mixed situation that I have found. I will like to fill the missing information (example 1) or correct the information (example 2) by id. I do not want to impute here, that will come later for those real missing. Which would be your recommendation in working with this type of data management problem? Thanks in advance, Jose [[alternative HTML version deleted]]