marcos carvajalino
2009-Oct-26 20:39 UTC
[R] issue with levels of a factor after subsetting
Hi Second question in a day, i'm beginnning to feel incompetent... This time i'm having a weird problem, i'm importing the next data base:>car<-read.csv2("Historicos.csv")'data.frame': 1818 obs. of 6 variables: $ Dpto : Factor w/ 11 levels "ANTIOQUIA","ATL?NTICO",..: 2 2 2 2 2 1 1 1 1 5 ... $ Rio : Factor w/ 43 levels "Acand?","Anchicay?",..: 26 26 26 26 26 4 4 4 4 39 ... $ Var : Factor w/ 13 levels "CAUDAL","CD",..: 1 1 1 1 1 1 1 1 1 1 ... $ Valor : num 7150 7150 7121 7121 7121 ... $ A?o : int 2002 2003 2004 2009 2005 2002 2003 2004 2005 2009 ... $ Regi?n: Factor w/ 2 levels "CARIBE","PACIFICO": 1 1 1 1 1 1 1 1 1 2 ... The variable "Rio" contents names of 43 rivers in Colombia, now my boss wants me to show just 4 of them in a graph and the other 39 in another, i subsetted them using the following code: #The first 4 Rivers>car4<-car[car$Rio%in%c("Magdalena","Atrato","San Juan","Mira"),]#The other 39>car5<-car[!car$Rio%in%c("Magdalena","Sin?","Atrato","San Juan","Mira","Micay","Patia","Canal del Dique","Iscuand?","Guapi"),] And I plot the two graphs using: xyplot(Valor~A?o|Var,groups=Rio,data=car4[car4$Var%in%c("NT","PO4","HDD","CTE", "SST","OCT"),],layout=c(2,3),subscripts=T,scale=list(y=list(relation="free")),type="b") xyplot(Valor~A?o|Var,groups=Rio,data=car5[car5$Var%in%c("NT","PO4","HDD","CTE", "SST","OCT"),],layout=c(2,3),subscripts=T,scale=list(y=list(relation="free")),type="b") Until then everything was going smoothly, but i tried to add a custom key using key=list(corner=c(1,1),border=T,lines=T,text=list(levels(car4$Rio))) and i was very suprised when instead of the expected 4 names of the rivers i got the whole 43 in the legend. i thought it was my fault and i missed something in the key instruction but when i checked the structure of the car4 data frame (The one with just the selected 4 rivers) i found out this:>str(car4)'data.frame': 230 obs. of 6 variables: $ Dpto : Factor w/ 11 levels "ANTIOQUIA","ATL?NTICO",..: 2 2 2 2 2 1 1 1 1 5 ... $ Rio : Factor w/ 43 levels "Acand?","Anchicay?",..: 26 26 26 26 26 4 4 4 4 39 ... $ Var : Factor w/ 13 levels "CAUDAL","CD",..: 1 1 1 1 1 1 1 1 1 1 ... $ Valor : num 7150 7150 7121 7121 7121 ... $ A?o : num 2002 2003 2004 2009 2005 ... $ Regi?n: Factor w/ 2 levels "CARIBE","PACIFICO": 1 1 1 1 1 1 1 1 1 2 ... The new data frame (car4) keeped the factor levels of the old data frame (car), how can i drop them from the new data frame and just keep the 4 selected levels? Thanks by advanced... -- Marcos Antonio Carvajalino Fern?ndez Estudiante de Ingenier?a Ambiental y Sanitaria Universidad del Magdalena, Colombia
marcos carvajalino
2009-Oct-26 21:01 UTC
[R] issue with levels of a factor after subsetting
Thanks Phil, exactly what i was looking for!! 2009/10/26 Phil Spector <spector at stat.berkeley.edu>:> Marcos - > ? Either refer to > > levels(factor(car4$Rio)) > > or > > levels(car4$Rio[,drop=TRUE]) > > to show only the levels that are present in the variable. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Phil Spector > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Statistical Computing Facility > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Department of Statistics > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? UC Berkeley > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? spector at stat.berkeley.edu > > On Mon, 26 Oct 2009, marcos carvajalino wrote: > >> Hi >> >> Second question in a day, i'm beginnning to feel incompetent... >> >> This time i'm having a weird problem, i'm importing the next data base: >> >>> car<-read.csv2("Historicos.csv") >> >> 'data.frame': ? 1818 obs. of ?6 variables: >> $ Dpto ?: Factor w/ 11 levels "ANTIOQUIA","ATL?NTICO",..: 2 2 2 2 2 1 >> 1 1 1 5 ... >> $ Rio ? : Factor w/ 43 levels "Acand?","Anchicay?",..: 26 26 26 26 26 >> 4 4 4 4 39 ... >> $ Var ? : Factor w/ 13 levels "CAUDAL","CD",..: 1 1 1 1 1 1 1 1 1 1 ... >> $ Valor : num ?7150 7150 7121 7121 7121 ... >> $ A?o ? : int ?2002 2003 2004 2009 2005 2002 2003 2004 2005 2009 ... >> $ Regi?n: Factor w/ 2 levels "CARIBE","PACIFICO": 1 1 1 1 1 1 1 1 1 2 ... >> >> The variable "Rio" contents names of 43 rivers in Colombia, now my >> boss wants me to show just 4 of them in a graph and the other 39 in >> another, i subsetted them using the following code: >> >> #The first 4 Rivers >>> >>> car4<-car[car$Rio%in%c("Magdalena","Atrato","San Juan","Mira"),] >> >> #The other 39 >>> >>> car5<-car[!car$Rio%in%c("Magdalena","Sin?","Atrato","San >>> Juan","Mira","Micay", >> >> "Patia","Canal del Dique","Iscuand?","Guapi"),] >> >> And I plot the two graphs using: >> >> >> xyplot(Valor~A?o|Var,groups=Rio,data=car4[car4$Var%in%c("NT","PO4","HDD","CTE", >> >> "SST","OCT"),],layout=c(2,3),subscripts=T,scale=list(y=list(relation="free")),type="b") >> >> >> xyplot(Valor~A?o|Var,groups=Rio,data=car5[car5$Var%in%c("NT","PO4","HDD","CTE", >> >> "SST","OCT"),],layout=c(2,3),subscripts=T,scale=list(y=list(relation="free")),type="b") >> >> Until then everything was going smoothly, but i tried to add a custom >> key using >> key=list(corner=c(1,1),border=T,lines=T,text=list(levels(car4$Rio))) >> and i was very suprised when instead of the expected 4 names of the >> rivers i got the whole 43 in the legend. >> >> i thought it was my fault and i missed something in the key >> instruction but when i checked the structure of the car4 data frame >> (The one with just the selected 4 rivers) i found out this: >> >>> str(car4) >> >> 'data.frame': ? 230 obs. of ?6 variables: >> $ Dpto ?: Factor w/ 11 levels "ANTIOQUIA","ATL?NTICO",..: 2 2 2 2 2 1 >> 1 1 1 5 ... >> $ Rio ? : Factor w/ 43 levels "Acand?","Anchicay?",..: 26 26 26 26 26 >> 4 4 4 4 39 ... >> $ Var ? : Factor w/ 13 levels "CAUDAL","CD",..: 1 1 1 1 1 1 1 1 1 1 ... >> $ Valor : num ?7150 7150 7121 7121 7121 ... >> $ A?o ? : num ?2002 2003 2004 2009 2005 ... >> $ Regi?n: Factor w/ 2 levels "CARIBE","PACIFICO": 1 1 1 1 1 1 1 1 1 2 ... >> >> The new data frame (car4) keeped the factor levels of the old data >> frame (car), how can i drop them from the new data frame and just keep >> the 4 selected levels? >> >> Thanks by advanced... >> -- >> Marcos Antonio Carvajalino Fern?ndez >> Estudiante de Ingenier?a Ambiental y Sanitaria >> Universidad del Magdalena, Colombia >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >-- Marcos Antonio Carvajalino Fern?ndez Estudiante de Ingenier?a Ambiental y Sanitaria Universidad del Magdalena, Colombia