Hello I have the following problem : The dataframe TEST has multiple lines for a same person because : there are differents values of Nom or differents values of Prenom but the values of Matricule or Sexe or Date.de.naissance are the same. TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L, 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER", "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT" ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L, 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine", "Jeannine", "Michel", "Michele", "Mich?le", "Michelle", "Victor" ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = c("Matricule", "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA, -11L)) I would want to make homogeneous the information and would like built 2 dataframes : df1 wich has the value of Nom and Prenom of the first lines of TEST when there are different values. The other values (Matricule or Sexe or Date.de.naissance) are unchanged df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L, 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER", "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar", "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor" ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = c("Matricule", "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA, -11L)) df2 wich has the value of Nom and Prenom of the last lines of TEST when there are different values. The other values (Matricule or Sexe or Date.de.naissance) are unchanged. df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L, 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"), Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel", "Mich?le", "Michelle", "Victor"), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = c("Matricule", "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA, -11L)) Thank for your helps Michel -- Michel ARNAUD Charg? de mission aupr?s du DRH DGDRD-Drh - TA 174/04 Av Agropolis 34398 Montpellier cedex 5 tel : 04.67.61.75.38 fax : 04.67.61.57.87 port: 06.47.43.55.31
On 24-07-2013, at 08:39, Arnaud Michel <michel.arnaud at cirad.fr> wrote:> Hello > > I have the following problem : > The dataframe TEST has multiple lines for a same person because : > there are differents values of Nom or differents values of Prenom > but the values of Matricule or Sexe or Date.de.naissance are the same. > > TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, > 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L, > 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER", > "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT" > ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L, > 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine", > "Jeannine", "Michel", "Michele", "Mich?le", "Michelle", "Victor" > ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, > 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"), class = "factor"), > Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, > 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", > "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = c("Matricule", > "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA, > -11L)) > > > I would want to make homogeneous the information and would like built 2 dataframes : > df1 wich has the value of Nom and Prenom of the first lines of TEST when there are different values. The other values (Matricule or Sexe or Date.de.naissance) are unchanged > > df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, > 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L, > 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER", > "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom = structure(c(6L, > 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar", > "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor" > ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, > 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"), class = "factor"), > Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, > 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", > "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = c("Matricule", > "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA, > -11L)) > > df2 wich has the value of Nom and Prenom of the last lines of TEST when there are different values. The other values (Matricule or Sexe or Date.de.naissance) are unchanged. > > df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, > 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L, > 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE", > "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"), > Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, > 5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel", > "Mich?le", "Michelle", "Victor"), class = "factor"), Sexe = structure(c(1L, > 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", > "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L, > 2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940", > "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936", > "30/03/1935"), class = "factor")), .Names = c("Matricule", > "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA, > -11L)) >Something like this r1 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule), FUN=function(x) {x[,c("Nom","Prenom")] <- x[1,c("Nom","Prenom"),drop=TRUE];x}))) rownames(r1) <- NULL r1 r2 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule), FUN=function(x) {x[,c("Nom","Prenom")] <- x[nrow(x),c("Nom","Prenom"),drop=TRUE];x}))) rownames(r2) <- NULL r2 #> identical(r1,df1) #[1] TRUE #> identical(r2,df2) #[1] TRUE Note: I had to change the Prenom and Sexe columns because of encoding issues. but that shouldn't have any influence on the above. Berend
Hi Michel, You could try: df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),]) row.names(df1New)<-1:nrow(df1New) df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),]) row.names(df2New)<-1:nrow(df2New) ?identical(df1New,df1) #[1] TRUE ?identical(df2New,df2) #[1] TRUE A.K. ----- Original Message ----- From: Arnaud Michel <michel.arnaud at cirad.fr> To: R help <r-help at r-project.org> Cc: Sent: Wednesday, July 24, 2013 2:39 AM Subject: [R] Change values in a dateframe Hello I have the following problem : The dataframe TEST has multiple lines for a same person because : there are differents values of Nom or differents values of Prenom but the values of Matricule or Sexe or Date.de.naissance are the same. TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L, 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER", "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT" ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L, 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine", "Jeannine", "Michel", "Michele", "Mich?le", "Michelle", "Victor" ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"), class = "factor"), ? ? Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, ? ? 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", ? ? "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = c("Matricule", "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA, -11L)) I would want to make homogeneous the information and would like built 2 dataframes : df1 wich has the value of Nom and Prenom of the first lines of TEST when there are different values. The other values (Matricule or Sexe or Date.de.naissance) are unchanged df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L, 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER", "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar", "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor" ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"), class = "factor"), ? ? Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, ? ? 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", ? ? "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = c("Matricule", "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA, -11L)) df2 wich has the value of Nom and Prenom of the last lines of TEST when there are different values. The other values (Matricule or Sexe or Date.de.naissance) are unchanged. df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L, 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"), ? ? Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, ? ? 5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel", ? ? "Mich?le", "Michelle", "Victor"), class = "factor"), Sexe = structure(c(1L, ? ? 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", ? ? "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L, ? ? 2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940", ? ? "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936", ? ? "30/03/1935"), class = "factor")), .Names = c("Matricule", "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA, -11L)) Thank for your helps Michel -- Michel ARNAUD Charg? de mission aupr?s du DRH DGDRD-Drh - TA 174/04 Av Agropolis 34398 Montpellier cedex 5 tel : 04.67.61.75.38 fax : 04.67.61.57.87 port: 06.47.43.55.31 ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Arun, Merci ? toi Bien amicalement Michel Le 24/07/2013 15:29, arun a ?crit :> Hi Michel, > You could try: > > > df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),]) > row.names(df1New)<-1:nrow(df1New) > df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),]) > row.names(df2New)<-1:nrow(df2New) > identical(df1New,df1) > #[1] TRUE > identical(df2New,df2) > #[1] TRUE > A.K. > > > > ----- Original Message ----- > From: Arnaud Michel <michel.arnaud at cirad.fr> > To: R help <r-help at r-project.org> > Cc: > Sent: Wednesday, July 24, 2013 2:39 AM > Subject: [R] Change values in a dateframe > > Hello > > I have the following problem : > The dataframe TEST has multiple lines for a same person because : > there are differents values of Nom or differents values of Prenom > but the values of Matricule or Sexe or Date.de.naissance are the same. > > TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, > 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L, > 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER", > "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT" > ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L, > 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine", > "Jeannine", "Michel", "Michele", "Mich?le", "Michelle", "Victor" > ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, > 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"), class > "factor"), > Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, > 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", > "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class > "factor")), .Names = c("Matricule", > "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", > row.names = c(NA, > -11L)) > > > I would want to make homogeneous the information and would like built 2 > dataframes : > df1 wich has the value of Nom and Prenom of the first lines of TEST when > there are different values. The other values (Matricule or Sexe or > Date.de.naissance) are unchanged > > df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, > 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L, > 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER", > "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom > structure(c(6L, > 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar", > "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor" > ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, > 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"), class > "factor"), > Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, > 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", > "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class > "factor")), .Names = c("Matricule", > "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", > row.names = c(NA, > -11L)) > > df2 wich has the value of Nom and Prenom of the last lines of TEST when > there are different values. The other values (Matricule or Sexe or > Date.de.naissance) are unchanged. > > df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, > 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L, > 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE", > "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"), > Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, > 5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel", > "Mich?le", "Michelle", "Victor"), class = "factor"), Sexe > structure(c(1L, > 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", > "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L, > 2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940", > "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936", > "30/03/1935"), class = "factor")), .Names = c("Matricule", > "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", > row.names = c(NA, > -11L)) > > Thank for your helps > Michel >-- Michel ARNAUD Charg? de mission aupr?s du DRH DGDRD-Drh - TA 174/04 Av Agropolis 34398 Montpellier cedex 5 tel : 04.67.61.75.38 fax : 04.67.61.57.87 port: 06.47.43.55.31