#I have a dataset with two factor. I want to combine those factors into
a single factor and count the number of data values for each new factor.
The following gives a comparable dataframe:
a <- rep(c("a", "b"), c(6,6))
b <- rep(c("c", "d"), c(6,6))
df <- data.frame(f1=a, f2=b, d=rnorm(12))
df
# I use the 'interaction' function to combine factors f1 and f2:
df2 <- data.frame(f3=interaction(df[,"f1"], df[,"f2"]),
d=df[,"d"])
df2
# A count of the first data.frame using factor f1 returns the kind of
results I am looking for:
count <- as.data.frame(table(df$f1))
count
# Var1 Freq
#1 a 6
#2 b 6
# As does a count using factor2:
count2 <- as.data.frame(table(df$f2))
count2
# Var1 Freq
#1 a 6
#2 b 6
# The same procedure on the second dataframe does not treat the levels
of factor f3 discreetly, instead giving all possible combinations of f1
and f2.
count3 <- as.data.frame(table(df2$f3))
count3
# Var1 Freq
#1 a.c 6
#2 b.c 0
#3 a.d 0
#4 b.d 6
I need the results to be:
# Var1 Freq
#1 a 6
#2 b 6
# Any suggestions?
--
Sam Player, B.Sc.(Hons.) B.A.
Ph.D. Candidate, Faculty of Agriculture, Food & Natural Resources,
University of Sydney
Email: splayer at usyd.edu.au
Agroecosystems Research Group
Room 214 J.R.A. McMillan Building A05
University of Sydney NSW 2006, Australia
Angkor Research Program
Room 305 Old Teachers College A22
University of Sydney NSW 2006, Australia
Sam, Depending on what your ultimate aim is, perhaps you just want to add the 'drop=TRUE' argument to your interaction call. Peter Sam Player wrote:> #I have a dataset with two factor. I want to combine those factors into > a single factor and count the number of data values for each new factor. > The following gives a comparable dataframe: > > a <- rep(c("a", "b"), c(6,6)) > b <- rep(c("c", "d"), c(6,6)) > df <- data.frame(f1=a, f2=b, d=rnorm(12)) > df > > # I use the 'interaction' function to combine factors f1 and f2: > > df2 <- data.frame(f3=interaction(df[,"f1"], df[,"f2"]), d=df[,"d"]) > df2 > > # A count of the first data.frame using factor f1 returns the kind of > results I am looking for: > > count <- as.data.frame(table(df$f1)) > count > > # Var1 Freq > #1 a 6 > #2 b 6 > > # As does a count using factor2: > > count2 <- as.data.frame(table(df$f2)) > count2 > > # Var1 Freq > #1 a 6 > #2 b 6 > > # The same procedure on the second dataframe does not treat the levels > of factor f3 discreetly, instead giving all possible combinations of f1 > and f2. > > count3 <- as.data.frame(table(df2$f3)) > count3 > > # Var1 Freq > #1 a.c 6 > #2 b.c 0 > #3 a.d 0 > #4 b.d 6 > > I need the results to be: > > # Var1 Freq > #1 a 6 > #2 b 6 > > # Any suggestions? >
On Sep 19, 2009, at 5:39 AM, Sam Player wrote:> #I have a dataset with two factor. I want to combine those factors > into a single factor and count the number of data values for each > new factor. The following gives a comparable dataframe: > > a <- rep(c("a", "b"), c(6,6)) > b <- rep(c("c", "d"), c(6,6)) > df <- data.frame(f1=a, f2=b, d=rnorm(12)) > df > > # I use the 'interaction' function to combine factors f1 and f2: > > df2 <- data.frame(f3=interaction(df[,"f1"], df[,"f2"]), d=df[,"d"]) > df2 > > # A count of the first data.frame using factor f1 returns the kind > of results I am looking for: > > count <- as.data.frame(table(df$f1)) > count > > # Var1 Freq > #1 a 6 > #2 b 6 > > # As does a count using factor2: > > count2 <- as.data.frame(table(df$f2)) > count2 > > # Var1 Freq > #1 a 6 > #2 b 6 > > # The same procedure on the second dataframe does not treat the > levels of factor f3 discreetly, instead giving all possible > combinations of f1 and f2.We appear to have a different understanding of the term "discrete". The interaction function produces all possible combinations of factors and then table() counts the occurrences of such.> > count3 <- as.data.frame(table(df2$f3)) > count3 > > # Var1 Freq > #1 a.c 6 > #2 b.c 0 > #3 a.d 0 > #4 b.d 6 > > I need the results to be: > > # Var1 Freq > #1 a 6 > #2 b 6Puzzled. You already have such. Why would you want the interaction function to behave differently? Did you just want to create a label from f1 and f2? That can be achieved: > df2 <- df > df2$f12 <- with( df2, paste(f1,f2,sep=".") ) > df2 f1 f2 d f12 1 a c -0.52902802 a.c 2 a c -1.07351118 a.c 3 a c 0.63463011 a.c 4 a c 0.26857599 a.c 5 a c 1.57677999 a.c 6 a c 1.08645153 a.c 7 b d -0.60400852 b.d 8 b d -0.06611533 b.d 9 b d 1.00787048 b.d 10 b d 1.48289305 b.d 11 b d 0.54658888 b.d 12 b d -0.67630052 b.d > count3 <- as.data.frame(table(df2$f12)) > count3 Var1 Freq 1 a.c 6 2 b.d 6> > # Any suggestions? > > -- > Sam Player, B.Sc.(Hons.) B.A. > Ph.D. Candidate, Faculty of Agriculture, Food & Natural Resources, > University of Sydney > > Email: splayer at usyd.edu.au > > Agroecosystems Research Group > Room 214 J.R.A. McMillan Building A05 > University of Sydney NSW 2006, Australia > > Angkor Research Program > Room 305 Old Teachers College A22 > University of Sydney NSW 2006, Australia > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT