thr3ads.net - R help - [R] Change values in a dateframe [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Arnaud Michel

2013-Jul-24 06:39 UTC

[R] Change values in a dateframe

Hello

I have the following problem :
The dataframe TEST has multiple lines for a same person because :
there are differents values of Nom or differents values of Prenom
but the values of Matricule or Sexe or Date.de.naissance are the same.

TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF",
"GUTIER",
"JACQUE", "LANGUE", "LANGUE-LOPEZ",
"RIVIER", "TRU", "VINCENT"
), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie",
"Jeanine",
"Jeannine", "Michel", "Michele",
"Mich?le", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"),
class =
"factor"),
     Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
     1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946",
"07/12/1947",
     "18/11/1945", "27/09/1947", "29/12/1936",
"30/03/1935"), class =
"factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame",
row.names = c(NA,
-11L))


I would want to make homogeneous the information and would like built 2 
dataframes :
df1 wich has the value of Nom and Prenom of the first lines of TEST when 
there are different values. The other values (Matricule or Sexe or 
Date.de.naissance) are unchanged

df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF",
"GUTIER",
"JACQUE", "LANGUE", "TRU", "VINCENT"),
class = "factor"), Prenom =
structure(c(6L,
3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
"Elodie", "Jeanine", "Michel",
"Michele", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"),
class =
"factor"),
     Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
     1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946",
"07/12/1947",
     "18/11/1945", "27/09/1947", "29/12/1936",
"30/03/1935"), class =
"factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame",
row.names = c(NA,
-11L))

df2 wich has the value of Nom and Prenom of the last lines of TEST when 
there are different values. The other values (Matricule or Sexe or 
Date.de.naissance) are unchanged.

df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF",
"JACQUE",
"LANGUE-LOPEZ", "RIVIER", "TRU",
"VINCENT"), class = "factor"),
     Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
     5L, 5L), .Label = c("Edgar", "Elodie",
"Jeannine", "Michel",
     "Mich?le", "Michelle", "Victor"), class =
"factor"), Sexe =
structure(c(1L,
     1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin",
     "Masculin"), class = "factor"), Date.de.naissance =
structure(c(4L,
     2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940",
     "04/03/1946", "07/12/1947", "18/11/1945",
"27/09/1947", "29/12/1936",
     "30/03/1935"), class = "factor")), .Names =
c("Matricule",
"Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame",
row.names = c(NA,
-11L))

Thank for your helps
Michel

-- 
Michel ARNAUD
Charg? de mission aupr?s du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31

Berend Hasselman

2013-Jul-24 07:48 UTC

head link

[R] Change values in a dateframe

On 24-07-2013, at 08:39, Arnaud Michel <michel.arnaud at cirad.fr> wrote:
> Hello
> 
> I have the following problem :
> The dataframe TEST has multiple lines for a same person because :
> there are differents values of Nom or differents values of Prenom
> but the values of Matricule or Sexe or Date.de.naissance are the same.
> 
> TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
> 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF",
"GUTIER",
> "JACQUE", "LANGUE", "LANGUE-LOPEZ",
"RIVIER", "TRU", "VINCENT"
> ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
> 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie",
"Jeanine",
> "Jeannine", "Michel", "Michele",
"Mich?le", "Michelle", "Victor"
> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin",
"Masculin"), class = "factor"),
>    Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>    1L, 3L, 3L, 3L), .Label = c("03/09/1940",
"04/03/1946", "07/12/1947",
>    "18/11/1945", "27/09/1947", "29/12/1936",
"30/03/1935"), class = "factor")), .Names =
c("Matricule",
> "Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame", row.names =
c(NA,
> -11L))
> 
> 
> I would want to make homogeneous the information and would like built 2
dataframes :
> df1 wich has the value of Nom and Prenom of the first lines of TEST when
there are different values. The other values (Matricule or Sexe or
Date.de.naissance) are unchanged
> 
> df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
> 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF",
"GUTIER",
> "JACQUE", "LANGUE", "TRU",
"VINCENT"), class = "factor"), Prenom = structure(c(6L,
> 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
> "Elodie", "Jeanine", "Michel",
"Michele", "Michelle", "Victor"
> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin",
"Masculin"), class = "factor"),
>    Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>    1L, 3L, 3L, 3L), .Label = c("03/09/1940",
"04/03/1946", "07/12/1947",
>    "18/11/1945", "27/09/1947", "29/12/1936",
"30/03/1935"), class = "factor")), .Names =
c("Matricule",
> "Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame", row.names =
c(NA,
> -11L))
> 
> df2 wich has the value of Nom and Prenom of the last lines of TEST when
there are different values. The other values (Matricule or Sexe or
Date.de.naissance) are unchanged.
> 
> df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
> 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF",
"JACQUE",
> "LANGUE-LOPEZ", "RIVIER", "TRU",
"VINCENT"), class = "factor"),
>    Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
>    5L, 5L), .Label = c("Edgar", "Elodie",
"Jeannine", "Michel",
>    "Mich?le", "Michelle", "Victor"), class =
"factor"), Sexe = structure(c(1L,
>    1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin",
>    "Masculin"), class = "factor"), Date.de.naissance =
structure(c(4L,
>    2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label =
c("03/09/1940",
>    "04/03/1946", "07/12/1947", "18/11/1945",
"27/09/1947", "29/12/1936",
>    "30/03/1935"), class = "factor")), .Names =
c("Matricule",
> "Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame", row.names =
c(NA,
> -11L))
> 
Something like this

r1 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                    FUN=function(x) {x[,c("Nom","Prenom")]
<- x[1,c("Nom","Prenom"),drop=TRUE];x})))
rownames(r1) <- NULL 
r1

r2 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                    FUN=function(x) {x[,c("Nom","Prenom")]
<- x[nrow(x),c("Nom","Prenom"),drop=TRUE];x})))
rownames(r2) <- NULL
r2

#> identical(r1,df1)
#[1] TRUE
#> identical(r2,df2)
#[1] TRUE

Note: I had to change the Prenom and Sexe columns because of encoding issues.
but that shouldn't have any influence on the above.

Berend

arun

2013-Jul-24 13:29 UTC

head link

[R] Change values in a dateframe

Hi Michel,
You could try:


df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),])
row.names(df1New)<-1:nrow(df1New)
df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),])
row.names(df2New)<-1:nrow(df2New)
?identical(df1New,df1)
#[1] TRUE
?identical(df2New,df2)
#[1] TRUE
A.K.



----- Original Message -----
From: Arnaud Michel <michel.arnaud at cirad.fr>
To: R help <r-help at r-project.org>
Cc: 
Sent: Wednesday, July 24, 2013 2:39 AM
Subject: [R] Change values in a dateframe

Hello

I have the following problem :
The dataframe TEST has multiple lines for a same person because :
there are differents values of Nom or differents values of Prenom
but the values of Matricule or Sexe or Date.de.naissance are the same.

TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF",
"GUTIER",
"JACQUE", "LANGUE", "LANGUE-LOPEZ",
"RIVIER", "TRU", "VINCENT"
), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie",
"Jeanine",
"Jeannine", "Michel", "Michele",
"Mich?le", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"),
class =
"factor"),
? ?  Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
? ?  1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946",
"07/12/1947",
? ?  "18/11/1945", "27/09/1947", "29/12/1936",
"30/03/1935"), class =
"factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame",
row.names = c(NA,
-11L))


I would want to make homogeneous the information and would like built 2 
dataframes :
df1 wich has the value of Nom and Prenom of the first lines of TEST when 
there are different values. The other values (Matricule or Sexe or 
Date.de.naissance) are unchanged

df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF",
"GUTIER",
"JACQUE", "LANGUE", "TRU", "VINCENT"),
class = "factor"), Prenom =
structure(c(6L,
3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
"Elodie", "Jeanine", "Michel",
"Michele", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin", "Masculin"),
class =
"factor"),
? ?  Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
? ?  1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946",
"07/12/1947",
? ?  "18/11/1945", "27/09/1947", "29/12/1936",
"30/03/1935"), class =
"factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame",
row.names = c(NA,
-11L))

df2 wich has the value of Nom and Prenom of the last lines of TEST when 
there are different values. The other values (Matricule or Sexe or 
Date.de.naissance) are unchanged.

df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF",
"JACQUE",
"LANGUE-LOPEZ", "RIVIER", "TRU",
"VINCENT"), class = "factor"),
? ?  Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
? ?  5L, 5L), .Label = c("Edgar", "Elodie",
"Jeannine", "Michel",
? ?  "Mich?le", "Michelle", "Victor"), class =
"factor"), Sexe =
structure(c(1L,
? ?  1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin",
? ?  "Masculin"), class = "factor"), Date.de.naissance =
structure(c(4L,
? ?  2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940",
? ?  "04/03/1946", "07/12/1947", "18/11/1945",
"27/09/1947", "29/12/1936",
? ?  "30/03/1935"), class = "factor")), .Names =
c("Matricule",
"Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame",
row.names = c(NA,
-11L))

Thank for your helps
Michel

-- 
Michel ARNAUD
Charg? de mission aupr?s du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Arnaud Michel

2013-Jul-24 14:31 UTC

head link

[R] Change values in a dateframe

Hi Arun,
Merci ? toi
Bien amicalement
Michel
Le 24/07/2013 15:29, arun a ?crit :> Hi Michel,
> You could try:
>
>
>
df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),])
> row.names(df1New)<-1:nrow(df1New)
>
df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),])
> row.names(df2New)<-1:nrow(df2New)
>   identical(df1New,df1)
> #[1] TRUE
>   identical(df2New,df2)
> #[1] TRUE
> A.K.
>
>
>
> ----- Original Message -----
> From: Arnaud Michel <michel.arnaud at cirad.fr>
> To: R help <r-help at r-project.org>
> Cc:
> Sent: Wednesday, July 24, 2013 2:39 AM
> Subject: [R] Change values in a dateframe
>
> Hello
>
> I have the following problem :
> The dataframe TEST has multiple lines for a same person because :
> there are differents values of Nom or differents values of Prenom
> but the values of Matricule or Sexe or Date.de.naissance are the same.
>
> TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
> 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF",
"GUTIER",
> "JACQUE", "LANGUE", "LANGUE-LOPEZ",
"RIVIER", "TRU", "VINCENT"
> ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
> 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie",
"Jeanine",
> "Jeannine", "Michel", "Michele",
"Mich?le", "Michelle", "Victor"
> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin",
"Masculin"), class > "factor"),
>       Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>       1L, 3L, 3L, 3L), .Label = c("03/09/1940",
"04/03/1946", "07/12/1947",
>       "18/11/1945", "27/09/1947",
"29/12/1936", "30/03/1935"), class >
"factor")), .Names = c("Matricule",
> "Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame",
> row.names = c(NA,
> -11L))
>
>
> I would want to make homogeneous the information and would like built 2
> dataframes :
> df1 wich has the value of Nom and Prenom of the first lines of TEST when
> there are different values. The other values (Matricule or Sexe or
> Date.de.naissance) are unchanged
>
> df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
> 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF",
"GUTIER",
> "JACQUE", "LANGUE", "TRU",
"VINCENT"), class = "factor"), Prenom > structure(c(6L,
> 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
> "Elodie", "Jeanine", "Michel",
"Michele", "Michelle", "Victor"
> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("F?minin",
"Masculin"), class > "factor"),
>       Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>       1L, 3L, 3L, 3L), .Label = c("03/09/1940",
"04/03/1946", "07/12/1947",
>       "18/11/1945", "27/09/1947",
"29/12/1936", "30/03/1935"), class >
"factor")), .Names = c("Matricule",
> "Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame",
> row.names = c(NA,
> -11L))
>
> df2 wich has the value of Nom and Prenom of the last lines of TEST when
> there are different values. The other values (Matricule or Sexe or
> Date.de.naissance) are unchanged.
>
> df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
> 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF",
"JACQUE",
> "LANGUE-LOPEZ", "RIVIER", "TRU",
"VINCENT"), class = "factor"),
>       Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
>       5L, 5L), .Label = c("Edgar", "Elodie",
"Jeannine", "Michel",
>       "Mich?le", "Michelle", "Victor"), class
= "factor"), Sexe > structure(c(1L,
>       1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label =
c("F?minin",
>       "Masculin"), class = "factor"), Date.de.naissance
= structure(c(4L,
>       2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label =
c("03/09/1940",
>       "04/03/1946", "07/12/1947",
"18/11/1945", "27/09/1947", "29/12/1936",
>       "30/03/1935"), class = "factor")), .Names =
c("Matricule",
> "Nom", "Prenom", "Sexe",
"Date.de.naissance"), class = "data.frame",
> row.names = c(NA,
> -11L))
>
> Thank for your helps
> Michel
>
-- 
Michel ARNAUD
Charg? de mission aupr?s du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Jul 2013 - Change values in a dateframe

[R] Change values in a dateframe

[R] Change values in a dateframe

[R] Change values in a dateframe

[R] Change values in a dateframe

Apparently Analagous Threads