Conor Robinson
2007-Aug-01 21:12 UTC
[R] new user question on dataframe comparisons and plots
I'm coming from the scipy community and have been using R on and for the past week or so. I'm still feeling out the language structure, but so far so good. I apologize in advance if I pose any obvious questions, due to my current lack of diction when searching for my issue, or recognizing it if I did see it. Question 1, plots: I have a data frame with 4 type factor columns, also in the data frame I have one single, type logical column with the response data (T or F). I would like to plot a 4*4 grid showing all the two way attribute interactions like with plot(data.frame) or pairs(data.frame, panel=panel.smooth), however show the response's True and False as different colors, or any other built in graphical analysis that might be relevant in this case. I'm sure this is simple since this is a common procedure, thanks in advance for humoring me. Also, what is the correct term for this type of plot? Question 2, data frame analysis: I have two sub data frames split by whether my logical column is T or F. I want to compare the same factor column between both of the two sub data frames (there are a few hundred different unique possibles for this factor column eg AAAA - ZZZZ enumerated). I've used table() on the attribute columns from each sub frame to get counts. pos <- data.frame(table(df.true$CAT)) AAAA 10 BASD 0 ZAQM 4 ... neg <- data.frame(table(df.false$CAT)) AAAA 1000 BASD 3 ZAQM 9 PPWS 10 ... The TRUE sub frame has less unique factors that the sub frame FALSE, I would like an output data frame that is one column all the factors from the TRUE sub frame and the second column the counts from the TRUE attributes / counts from the corresponding FALSE attributes ie %response for each represented factor. It's fine (better even) if all factors are included and there is just a zero for the attributes with no TRUEs. I've been going off making my own function and running into trouble with the data frame not being a vector etc etc, but I have a feeling there is a *much* better way ie built in function, but I've hit my current level of R understanding. Thank you, Conor
Stephen Tucker
2007-Aug-02 05:55 UTC
[R] new user question on dataframe comparisons and plots
Hi Conor, I hope I interpreted your question correctly. I think for the first one you are looking for a conditioning plot? I am going to create and use some nonsensical data - 'iris' comes with R so this should be reproducible on your machine: library(lattice) data(iris) x <- iris # make some factors using cut() x[,2:3] <- lapply(x[,2:3],cut,3) # add column of TRUE FALSE x <- cbind(x,TF=sample(c(TRUE,FALSE),nrow(x),replace=TRUE)) xyplot(petal.wid~petal.len | ## these are numeric sepal.wid*sepal.len, ## these are factors groups=TF, ## TRUE or FALSE panel=function(x,y,...) { panel.xyplot(x,y,...) panel.loess(x,y,...) }, data=x,auto.key=TRUE) merge() should work when you have different factors, when you specify all=TRUE. ## get counts for TRUE and FALSE> y <- tapply(x$species,INDEX=x$TF,+ function(x) as.data.frame(table(x))) ## merge results> (z <- `names<-`(merge(y$`TRUE`,y$`FALSE`,by="x",all=TRUE),+ c("factor","true","false"))) factor true false 1 versicolor 29 21 2 virginica 23 27 ## reshape the data frame> library(reshape) > melt(z,id=1)factor variable value 1 versicolor true 29 2 virginica true 23 3 versicolor false 21 4 virginica false 27 Hope this helps. If it doesn't you can post a small (reproducible) piece of data and we can maybe help you out a little better... Best regards, ST --- Conor Robinson <conor.robinson at gmail.com> wrote:> I'm coming from the scipy community and have been using R on and for > the past week or so. I'm still feeling out the language structure, > but so far so good. I apologize in advance if I pose any obvious > questions, due to my current lack of diction when searching for my > issue, or recognizing it if I did see it. > > Question 1, plots: > > I have a data frame with 4 type factor columns, also in the data frame > I have one single, type logical column with the response data (T or > F). I would like to plot a 4*4 grid showing all the two way attribute > interactions like with plot(data.frame) or pairs(data.frame, > panel=panel.smooth), however show the response's True and False as > different colors, or any other built in graphical analysis that might > be relevant in this case. I'm sure this is simple since this is a > common procedure, thanks in advance for humoring me. Also, what is > the correct term for this type of plot? > > > Question 2, data frame analysis: > > I have two sub data frames split by whether my logical column is T or > F. I want to compare the same factor column between both of the two > sub data frames (there are a few hundred different unique possibles > for this factor column eg AAAA - ZZZZ enumerated). I've used table() > on the attribute columns from each sub frame to get counts. > > pos <- data.frame(table(df.true$CAT)) > > AAAA 10 > BASD 0 > ZAQM 4 > ... > > neg <- data.frame(table(df.false$CAT)) > > AAAA 1000 > BASD 3 > ZAQM 9 > PPWS 10 > ... > > The TRUE sub frame has less unique factors that the sub frame FALSE, I > would like an output data frame that is one column all the factors > from the TRUE sub frame and the second column the counts from the TRUE > attributes / counts from the corresponding FALSE attributes ie > %response for each represented factor. It's fine (better even) if all > factors are included and there is just a zero for the attributes with > no TRUEs. > > I've been going off making my own function and running into trouble > with the data frame not being a vector etc etc, but I have a feeling > there is a *much* better way ie built in function, but I've hit my > current level of R understanding. > > Thank you, > Conor > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >