Karen Kotschy
2005-Jul-12 10:50 UTC
[R] adding a factor column based on levels of another factor
Hi R users Does anyone out there have a better/quicker way of adding a factor column to a data frame based on levels of another factor? I have a (large) data frame consisting of records for individual plants, each represented by a unique ID number. The species of each plant is indicated in the column "species", which is a factor column with many different levels (species). There are multiple records for each species, and there is no pattern to the order in which the species names appear in the data frame. e.g. uniqueID species elev ht diam 1 1 sp2 3.5 1.3 55 2 2 sp2 4.2 0.5 15 3 3 sp3 3.2 1.0 13 4 4 sp65 2.2 2.0 14 5 5 sp43 5.4 5.7 20 6 6 sp2 2.5 4.1 32 7 7 sp12 1.1 0.9 5 8 8 sp3 3.4 3.6 2 I would like to add a factor column to this data frame, indicating to which group each individual belongs. All individuals of the same species will belong to the same group. Is there a quick way of saying "for all instances of species1, give the value 5, for all instances of species2, give the value 4, etc" (where 5 and 4 are levels of a factor)? The only way I can think of doing it is to split the data frame by species, then add a column to each subset showing the group, then re-join all the subsets. This seems clumsy and prone to errors. Anyone know a better way? I've looked at expand.grid and gl but they don't seem to do what I want. Thanks! Karen Kotschy Centre for Water in the Environment University of the Witwatersrand Johannesburg South Africa
Henrik Andersson
2005-Jul-12 11:53 UTC
[R] adding a factor column based on levels of another factor
First create a dataframe with the translation you want, i.e. one column with the species and another with the number you want in the end. Then merge these two dataframes using 'merge' and voila.. I would start with looking at ?merge Cheers, Henrik Andersson Karen Kotschy wrote:> Hi R users > > Does anyone out there have a better/quicker way of adding a factor column > to a data frame based on levels of another factor? > > I have a (large) data frame consisting of records for individual plants, > each represented by a unique ID number. The species of each plant is > indicated in the column "species", which is a factor column with many > different levels (species). There are multiple records for each species, > and there is no pattern to the order in which the species names appear in > the data frame. > > e.g. > uniqueID species elev ht diam > 1 1 sp2 3.5 1.3 55 > 2 2 sp2 4.2 0.5 15 > 3 3 sp3 3.2 1.0 13 > 4 4 sp65 2.2 2.0 14 > 5 5 sp43 5.4 5.7 20 > 6 6 sp2 2.5 4.1 32 > 7 7 sp12 1.1 0.9 5 > 8 8 sp3 3.4 3.6 2 > > I would like to add a factor column to this data frame, indicating to > which group each individual belongs. All individuals of the same species > will belong to the same group. > > Is there a quick way of saying "for all instances of species1, give the > value 5, for all instances of species2, give the value 4, etc" (where 5 > and 4 are levels of a factor)? > > The only way I can think of doing it is to split the data frame by > species, then add a column to each subset showing the group, then > re-join all the subsets. This seems clumsy and prone to errors. Anyone > know a better way? > > I've looked at expand.grid and gl but they don't seem to do what I want. > > Thanks! > > Karen Kotschy > Centre for Water in the Environment > University of the Witwatersrand > Johannesburg > South Africa > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- --------------------------------------------- Henrik Andersson Netherlands Institute of Ecology - Centre for Estuarine and Marine Ecology P.O. Box 140 4400 AC Yerseke Phone: +31 113 577473 h.andersson at nioo.knaw.nl http://www.nioo.knaw.nl/ppages/handersson
Christoph Buser
2005-Jul-12 15:52 UTC
[R] adding a factor column based on levels of another factor
Hi Karen I am not sure if I understand correctly your question. If no, please ignore this answer. Do you want a new factor "group" which contains the same information like "species", just with other names, e.g 1,2,... or "A","B",... ? If yes you can do it like this ## Your data.frame (without ht & diam) dat <- data.frame(uniqueID = factor(1:8), species = c("sp2", "sp2", "sp3", "sp65", "sp43", "sp2", "sp12", "sp3"), elev = c(3.5, 4.2, 3.2, 2.2, 5.4, 2.5, 1.1, 3.4)) str(dat) ## new factor group (copy of species) dat[,"group"] <- dat[,"species"] ## rename the levels into "1", "2", ... or whatever you want: levels(dat[,"group"]) <- list("3" = "sp12", "4" = "sp2", "2" = "sp3", "5" = "sp43", "1" = "sp65") ## control dat[,"species"] dat[,"group"] Please be careful. This only changes the labels into 1,2,... If you use for example as.numeric(dat[,"group"]) you will get the values that are behind the original alphabetical ordering, meaning "sp12" is 1, "sp2" is 2, etc. You can change this, too if necessary, using as.character() and as.numeric() as well. I hope this is helpful fro your problem. Regards, Christoph Buser -------------------------------------------------------------- Christoph Buser <buser at stat.math.ethz.ch> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -------------------------------------------------------------- Karen Kotschy writes: > Hi R users > > Does anyone out there have a better/quicker way of adding a factor column > to a data frame based on levels of another factor? > > I have a (large) data frame consisting of records for individual plants, > each represented by a unique ID number. The species of each plant is > indicated in the column "species", which is a factor column with many > different levels (species). There are multiple records for each species, > and there is no pattern to the order in which the species names appear in > the data frame. > > e.g. > uniqueID species elev ht diam > 1 1 sp2 3.5 1.3 55 > 2 2 sp2 4.2 0.5 15 > 3 3 sp3 3.2 1.0 13 > 4 4 sp65 2.2 2.0 14 > 5 5 sp43 5.4 5.7 20 > 6 6 sp2 2.5 4.1 32 > 7 7 sp12 1.1 0.9 5 > 8 8 sp3 3.4 3.6 2 > > I would like to add a factor column to this data frame, indicating to > which group each individual belongs. All individuals of the same species > will belong to the same group. > > Is there a quick way of saying "for all instances of species1, give the > value 5, for all instances of species2, give the value 4, etc" (where 5 > and 4 are levels of a factor)? > > The only way I can think of doing it is to split the data frame by > species, then add a column to each subset showing the group, then > re-join all the subsets. This seems clumsy and prone to errors. Anyone > know a better way? > > I've looked at expand.grid and gl but they don't seem to do what I want. > > Thanks! > > Karen Kotschy > Centre for Water in the Environment > University of the Witwatersrand > Johannesburg > South Africa > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html