Bastien.Ferland-Raymond at mffp.gouv.qc.ca
2015-Jan-26 16:30 UTC
[R] Weird behavior of aggregate() function
Hello list, I have found a weird behavior of the aggregate() function when used with characters. I think the problem as to do with converting characters to factors. I'm trying to aggregate a character vector using an homemade function. My function is giving me all the possible pairs of modalities observed. Reproducible code: ####### ### my grouping variable gr <- c("A","A","B","B","C","C","C","D","D","E","E","E") ### my variable vari <- c("rs2","rs2","mj2","mj1","rs1","rs1","rs2","mj1","mj1","rs1","mj1","mj2") ### what the table would look like cbind(gr,vari) ### My function that gives every pairs of variables possible (my real function can go up to length(TE)==5, but for the sake of the example, I've reduced it here) faire.paires <- function(TE){ gg <- rbind(c(TE[1],TE[2]), c(TE[1],TE[3])) gg <- gg[rowSums(is.na(gg))==0,,drop=F] gg } ### The function gives exactly what I want when I run it on a specific entry faire.paires(TE = vari[gr=="B"]) ### But with aggregate(), it transforms everything into integer res <- aggregate(list(TE = vari), by=list(gr),faire.paires) res str(res) ### it's like it's using factor than losing the key to tell me which integer ### mean which modality ### if I give it directly factors: res2 <- aggregate(list(TE = as.factor(vari)), by=list(gr),faire.paires) res2 str(res2) ### does not fix the problem. ############ Any idea? I know my function may not be the best or most efficient way to succeed. However, I'm still puzzled on why aggregate gives me this weird output. Best regards, Bastien Ferland-Raymond, M.Sc. Stat., M.Sc.?Biol. Division des orientations et projets sp?ciaux Direction des inventaires forestiers Minist?re des For?ts, de la Faune et des Parcs?
?aggregate informs you that unless x is a time series it will be converted to a data.frame. data.frame will convert your character to a factor unless you tell it not to. You can prevent this by converting vari to a data.frame yourself, passing the stringsAsFactors argument, like this: aggregate(data.frame(TE = vari, stringsAsFactors = FALSE), by=list(gr),faire.paires) Best, Ista On Mon, Jan 26, 2015 at 11:30 AM, <Bastien.Ferland-Raymond at mffp.gouv.qc.ca> wrote:> > Hello list, > > I have found a weird behavior of the aggregate() function when used with characters. I think the problem as to do with converting characters to factors. > > I'm trying to aggregate a character vector using an homemade function. My function is giving me all the possible pairs of modalities observed. > > > Reproducible code: > > ####### > ### my grouping variable > gr <- c("A","A","B","B","C","C","C","D","D","E","E","E") > ### my variable > vari <- c("rs2","rs2","mj2","mj1","rs1","rs1","rs2","mj1","mj1","rs1","mj1","mj2") > > ### what the table would look like > cbind(gr,vari) > > ### My function that gives every pairs of variables possible (my real function can go up to length(TE)==5, but for the sake of the example, I've reduced it here) > faire.paires <- function(TE){ > gg <- rbind(c(TE[1],TE[2]), > c(TE[1],TE[3])) > gg <- gg[rowSums(is.na(gg))==0,,drop=F] > gg > } > > ### The function gives exactly what I want when I run it on a specific entry > faire.paires(TE = vari[gr=="B"]) > > ### But with aggregate(), it transforms everything into integer > res <- aggregate(list(TE = vari), by=list(gr),faire.paires) > res > str(res) > > ### it's like it's using factor than losing the key to tell me which integer > ### mean which modality > > > ### if I give it directly factors: > res2 <- aggregate(list(TE = as.factor(vari)), by=list(gr),faire.paires) > res2 > str(res2) > > ### does not fix the problem. > ############ > > Any idea? > > I know my function may not be the best or most efficient way to succeed. However, I'm still puzzled on > why aggregate gives me this weird output. > > Best regards, > > Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol. > Division des orientations et projets sp?ciaux > Direction des inventaires forestiers > Minist?re des For?ts, de la Faune et des Parcs > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bastien.Ferland-Raymond at mffp.gouv.qc.ca
2015-Jan-26 18:52 UTC
[R] Weird behavior of aggregate() function
Thanks Ista for youe help, it works and I understand why. However, I'm still confuse why the previous code lost the "factor key". It could just have converted to factors and output factors but instead it's outputing integer... I'm not a very big fan of the default stringAsFactors=T, but that's another debate. Anyway, thanks again, Bastien -----Message d'origine----- De?: Ista Zahn [mailto:istazahn at gmail.com] Envoy??: 26 janvier 2015 11:51 ??: Ferland-Raymond, Bastien (DIF) Cc?: r-help at r-project.org Objet?: Re: [R] Weird behavior of aggregate() function ?aggregate informs you that unless x is a time series it will be converted to a data.frame. data.frame will convert your character to a factor unless you tell it not to. You can prevent this by converting vari to a data.frame yourself, passing the stringsAsFactors argument, like this: aggregate(data.frame(TE = vari, stringsAsFactors = FALSE), by=list(gr),faire.paires) Best, Ista On Mon, Jan 26, 2015 at 11:30 AM, <Bastien.Ferland-Raymond at mffp.gouv.qc.ca> wrote:> > Hello list, > > I have found a weird behavior of the aggregate() function when used with characters. I think the problem as to do with converting characters to factors. > > I'm trying to aggregate a character vector using an homemade function. My function is giving me all the possible pairs of modalities observed. > > > Reproducible code: > > ####### > ### my grouping variable > gr <- c("A","A","B","B","C","C","C","D","D","E","E","E") > ### my variable > vari <- > c("rs2","rs2","mj2","mj1","rs1","rs1","rs2","mj1","mj1","rs1","mj1","m > j2") > > ### what the table would look like > cbind(gr,vari) > > ### My function that gives every pairs of variables possible (my real > function can go up to length(TE)==5, but for the sake of the example, > I've reduced it here) faire.paires <- function(TE){ gg <- rbind(c(TE[1],TE[2]), > c(TE[1],TE[3])) > gg <- gg[rowSums(is.na(gg))==0,,drop=F] gg } > > ### The function gives exactly what I want when I run it on a > specific entry faire.paires(TE = vari[gr=="B"]) > > ### But with aggregate(), it transforms everything into integer res > <- aggregate(list(TE = vari), by=list(gr),faire.paires) res > str(res) > > ### it's like it's using factor than losing the key to tell me which > integer ### mean which modality > > > ### if I give it directly factors: > res2 <- aggregate(list(TE = as.factor(vari)), > by=list(gr),faire.paires) > res2 > str(res2) > > ### does not fix the problem. > ############ > > Any idea? > > I know my function may not be the best or most efficient way to > succeed. However, I'm still puzzled on why aggregate gives me this weird output. > > Best regards, > > Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol. > Division des orientations et projets sp?ciaux Direction des > inventaires forestiers Minist?re des For?ts, de la Faune et des Parcs > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.