thr3ads.net - R help - [R] new user question on dataframe comparisons and plots [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Conor Robinson

2007-Aug-01 21:12 UTC

[R] new user question on dataframe comparisons and plots

I'm coming from the scipy community and have been using R on and for
the past week or so.  I'm still feeling out the language structure,
but so far so good.  I apologize in advance if I pose any obvious
questions, due to my current lack of diction when searching for my
issue, or recognizing it if I did see it.

Question 1, plots:

I have a data frame with 4 type factor columns, also in the data frame
I have one single, type logical column with the response data (T or
F).  I would like to plot a 4*4 grid showing all the two way attribute
interactions like with plot(data.frame) or pairs(data.frame,
panel=panel.smooth), however show the response's True and False as
different colors, or any other built in graphical analysis that might
be relevant in this case.  I'm sure this is simple since this is a
common procedure, thanks in advance for humoring me.  Also, what is
the correct term for this type of plot?


Question 2, data frame analysis:

I have two sub data frames split by whether my logical column is T or
F.  I want to compare the same factor column between both of the two
sub data frames (there are a few hundred different unique possibles
for this factor column eg AAAA - ZZZZ enumerated).  I've used table()
on the attribute columns from each sub frame to get counts.

pos <- data.frame(table(df.true$CAT))

AAAA  10
BASD  0
ZAQM 4
...

neg <- data.frame(table(df.false$CAT))

AAAA 1000
BASD  3
ZAQM  9
PPWS 10
...

The TRUE sub frame has less unique factors that the sub frame FALSE, I
would like an output data frame that is one column all the factors
from the TRUE sub frame and the second column the counts from the TRUE
attributes / counts from the corresponding FALSE attributes ie
%response for each represented factor.  It's fine (better even) if all
factors are included and there is just a zero for the attributes with
no TRUEs.

I've been going off making my own function and running into trouble
with the data frame not being a vector etc etc, but I have a feeling
there is a *much* better way ie built in function, but I've hit my
current level of R understanding.

Thank you,
Conor

Stephen Tucker

2007-Aug-02 05:55 UTC

head link

[R] new user question on dataframe comparisons and plots

Hi Conor,

I hope I interpreted your question correctly. I think for the first one you
are looking for a conditioning plot? I am going to create and use some
nonsensical data - 'iris' comes with R so this should be reproducible on
your
machine:

library(lattice)
data(iris)
x <- iris
# make some factors using cut()
x[,2:3] <- lapply(x[,2:3],cut,3)
# add column of TRUE FALSE
x <- cbind(x,TF=sample(c(TRUE,FALSE),nrow(x),replace=TRUE))
xyplot(petal.wid~petal.len | ## these are numeric
       sepal.wid*sepal.len,  ## these are factors
       groups=TF,            ## TRUE or FALSE
       panel=function(x,y,...) {
         panel.xyplot(x,y,...)
         panel.loess(x,y,...)
       },
       data=x,auto.key=TRUE)


merge() should work when you have different factors, when you specify
all=TRUE.

## get counts for TRUE and FALSE> y <- tapply(x$species,INDEX=x$TF,+            function(x) as.data.frame(table(x)))
## merge results> (z <- `names<-`(merge(y$`TRUE`,y$`FALSE`,by="x",all=TRUE),+           c("factor","true","false")))
      factor true false
1 versicolor   29    21
2  virginica   23    27

## reshape the data frame> library(reshape)
> melt(z,id=1)      factor variable value
1 versicolor     true    29
2  virginica     true    23
3 versicolor    false    21
4  virginica    false    27

Hope this helps. If it doesn't you can post a small (reproducible) piece of
data and we can maybe help you out a little better...

Best regards,

ST


--- Conor Robinson <conor.robinson at gmail.com> wrote:
> I'm coming from the scipy community and have been using R on and for
> the past week or so.  I'm still feeling out the language structure,
> but so far so good.  I apologize in advance if I pose any obvious
> questions, due to my current lack of diction when searching for my
> issue, or recognizing it if I did see it.
> 
> Question 1, plots:
> 
> I have a data frame with 4 type factor columns, also in the data frame
> I have one single, type logical column with the response data (T or
> F).  I would like to plot a 4*4 grid showing all the two way attribute
> interactions like with plot(data.frame) or pairs(data.frame,
> panel=panel.smooth), however show the response's True and False as
> different colors, or any other built in graphical analysis that might
> be relevant in this case.  I'm sure this is simple since this is a
> common procedure, thanks in advance for humoring me.  Also, what is
> the correct term for this type of plot?
> 
> 
> Question 2, data frame analysis:
> 
> I have two sub data frames split by whether my logical column is T or
> F.  I want to compare the same factor column between both of the two
> sub data frames (there are a few hundred different unique possibles
> for this factor column eg AAAA - ZZZZ enumerated).  I've used table()
> on the attribute columns from each sub frame to get counts.
> 
> pos <- data.frame(table(df.true$CAT))
> 
> AAAA  10
> BASD  0
> ZAQM 4
> ...
> 
> neg <- data.frame(table(df.false$CAT))
> 
> AAAA 1000
> BASD  3
> ZAQM  9
> PPWS 10
> ...
> 
> The TRUE sub frame has less unique factors that the sub frame FALSE, I
> would like an output data frame that is one column all the factors
> from the TRUE sub frame and the second column the counts from the TRUE
> attributes / counts from the corresponding FALSE attributes ie
> %response for each represented factor.  It's fine (better even) if all
> factors are included and there is just a zero for the attributes with
> no TRUEs.
> 
> I've been going off making my own function and running into trouble
> with the data frame not being a vector etc etc, but I have a feeling
> there is a *much* better way ie built in function, but I've hit my
> current level of R understanding.
> 
> Thank you,
> Conor
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Aug 2007 - new user question on dataframe comparisons and plots

[R] new user question on dataframe comparisons and plots

[R] new user question on dataframe comparisons and plots

Apparently Analagous Threads