thr3ads.net - R help - [R] Non-Parametric Adventures in R [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Jamesp

2010-Oct-02 21:27 UTC

[R] Non-Parametric Adventures in R

I just started using R and I'm having all sorts of "fun" trying
different
things.

I'm going to document the different things I'm doing here as a kind of
case
study.  I'm hoping that I'll get help from the community so that I can
use R
properly.

Anyways, in this study, I have demographic data, drug usage data, and side
effect data.  All of this is loaded into a csv file.  I'm using Rweb as an
interface, so I had to modify the cgi-bin code slightly, but it works pretty
well.  I'm looking for frequency counts, some summary data for columns where
it makes sense, plots and X-squared tests.  My data frame is named X since
that's what Rweb names it.

----------------------------------------------------------------------------------------------------
1) I was thinking I'd have to go through each nominal variable (i.e.
table(X$race) ), but I think I have it figured out now.  summary(X) is nice,
but I need to recode nominal data with labels so the results are meaningful.

-----------------------------------------------------------------------------------------------------
2) I had an issue with multiple plots overwriting each other, and I managed
to bypass that with:
par(mfrow=c(2,1))
I have to update it to correspond to the number of plots I think.  There's
probably a better way to do this.

barplot(table(X$race))  prints out a barplot so that's great 

-----------------------------------------------------------------------------------------------------
3) I was able to code my data so it shows up in tables better with
X$race <- factor(X$race, levels = c(0,2), labels = c("African
American","White,Non-Hispanic"))

----------------------------------------------------------------------------------------------------
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
----------------------------------------------------------------------------------------------------
4) The coding for all of my drug variables is identical, and I'd like to
create a loop that goes through and labels accordingly

I'm not having good success with this yet, but here's what I'm
trying.

X[1,] <- factor(X[1,], levels = c(0,1,2,3,4,5), labels=
c("none","last
week","last 3 month","last year","regular use at
least 3 months","unknown
length of usage"))

I know I would need to replace the [1,] with something that gives me the
column, but I'm not sure what to put syntactically at the moment.

----------------------------------------------------------------------------------------------------
5) I had more success creating new variables based on the old ones.  So I
end up with yes/no answers to drug usage

for (i in 24:56)
{
  X[,i+173] <- ifelse(X[,i] >0,c(1),c(0))
}

I'd like to have been able to make a new variable name based off of the old
variable name (i.e. dropping "_when" from the end of each and replace
it
with "_yn")

---------------------------------------------------------------------------------------------------
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
---------------------------------------------------------------------------------------------------
6)  I'm able to make a cross-tabulated table and perform a X-squared test
just fine with my recoded variable

table(X$race,X[,197])
prop.test(table(X$race,X[,197]))

but I would like to be able to do so with all of my drugs, although I can't
seem to make that work

for (i in 197:229)
{
  table(X$race,X[,i])
  prop.test(table(X$race,X[,i]))
}

-------------------------------------------------------------------------------------------------

Thanks for reading over this and I do appreciate any help.  I understand
that there's "an R way" of doing things, and I look forward to
learning the
method.
-- 
View this message in context:
http://r.789695.n4.nabble.com/Non-Parametric-Adventures-in-R-tp2952754p2952754.html
Sent from the R help mailing list archive at Nabble.com.

Tal Galili

2010-Oct-03 09:11 UTC

head link

[R] Non-Parametric Adventures in R

Dear Jamesp,
This might be (more?) fitting for a blog then the R-help mailing list.

I'd suggest you to open a blog on (it takes less then 4 minutes):
wordpress.com
It now has syntax highlighting for R code:
http://www.r-statistics.com/2010/09/r-syntax-highlighting-for-bloggers-on-wordpress-com/
I also combined a list of tips for the R blogger <http://r-bloggers.com/>,
on this post:
http://www.r-statistics.com/2010/07/blogging-about-r-presentation-and-audio/


Cheers,
Tal


----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Sat, Oct 2, 2010 at 11:27 PM, Jamesp <james.jrp015@gmail.com> wrote:
>
> I just started using R and I'm having all sorts of "fun"
trying different
> things.
>
> I'm going to document the different things I'm doing here as a kind
of case
> study.  I'm hoping that I'll get help from the community so that I
can use
> R
> properly.
>
> Anyways, in this study, I have demographic data, drug usage data, and side
> effect data.  All of this is loaded into a csv file.  I'm using Rweb as
an
> interface, so I had to modify the cgi-bin code slightly, but it works
> pretty
> well.  I'm looking for frequency counts, some summary data for columns
> where
> it makes sense, plots and X-squared tests.  My data frame is named X since
> that's what Rweb names it.
>
>
>
----------------------------------------------------------------------------------------------------
> 1) I was thinking I'd have to go through each nominal variable (i.e.
> table(X$race) ), but I think I have it figured out now.  summary(X) is
> nice,
> but I need to recode nominal data with labels so the results are
> meaningful.
>
>
>
-----------------------------------------------------------------------------------------------------
> 2) I had an issue with multiple plots overwriting each other, and I managed
> to bypass that with:
> par(mfrow=c(2,1))
> I have to update it to correspond to the number of plots I think. 
There's
> probably a better way to do this.
>
> barplot(table(X$race))  prints out a barplot so that's great
>
>
>
-----------------------------------------------------------------------------------------------------
> 3) I was able to code my data so it shows up in tables better with
> X$race <- factor(X$race, levels = c(0,2), labels = c("African
> American","White,Non-Hispanic"))
>
>
>
----------------------------------------------------------------------------------------------------
>
>
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
>
----------------------------------------------------------------------------------------------------
> 4) The coding for all of my drug variables is identical, and I'd like
to
> create a loop that goes through and labels accordingly
>
> I'm not having good success with this yet, but here's what I'm
trying.
>
> X[1,] <- factor(X[1,], levels = c(0,1,2,3,4,5), labels=
c("none","last
> week","last 3 month","last year","regular use
at least 3 months","unknown
> length of usage"))
>
> I know I would need to replace the [1,] with something that gives me the
> column, but I'm not sure what to put syntactically at the moment.
>
>
>
----------------------------------------------------------------------------------------------------
> 5) I had more success creating new variables based on the old ones.  So I
> end up with yes/no answers to drug usage
>
> for (i in 24:56)
> {
>  X[,i+173] <- ifelse(X[,i] >0,c(1),c(0))
> }
>
> I'd like to have been able to make a new variable name based off of the
old
> variable name (i.e. dropping "_when" from the end of each and
replace it
> with "_yn")
>
>
>
---------------------------------------------------------------------------------------------------
>
>
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
>
---------------------------------------------------------------------------------------------------
> 6)  I'm able to make a cross-tabulated table and perform a X-squared
test
> just fine with my recoded variable
>
> table(X$race,X[,197])
> prop.test(table(X$race,X[,197]))
>
> but I would like to be able to do so with all of my drugs, although I
can't
> seem to make that work
>
> for (i in 197:229)
> {
>  table(X$race,X[,i])
>  prop.test(table(X$race,X[,i]))
> }
>
>
>
-------------------------------------------------------------------------------------------------
>
> Thanks for reading over this and I do appreciate any help.  I understand
> that there's "an R way" of doing things, and I look forward
to learning the
> method.
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/Non-Parametric-Adventures-in-R-tp2952754p2952754.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Peter Dalgaard

2010-Oct-03 12:31 UTC

head link

[R] Non-Parametric Adventures in R

>
----------------------------------------------------------------------------------------------------
>
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
----------------------------------------------------------------------------------------------------
> 4) The coding for all of my drug variables is identical, and I'd like
to
> create a loop that goes through and labels accordingly
> 
> I'm not having good success with this yet, but here's what I'm
trying.
> 
> X[1,] <- factor(X[1,], levels = c(0,1,2,3,4,5), labels=
c("none","last
> week","last 3 month","last year","regular use
at least 3 months","unknown
> length of usage"))
> 
> I know I would need to replace the [1,] with something that gives me the
> column, but I'm not sure what to put syntactically at the moment.
[I assume you meant X[,1] there]

Well a for loop like in 5) is not out of reach, you just need to figure
out what to loop over. It's probably neatest to do it by name, but you
could also do it by number (and that may be more convenient if the drug
variables are listed sequentially).

drugvar <- c(5,7,9,13)
--OR--
drugvar <- c("aspirin","warfarin", "heroin",
"nicotine")

in either case,

mylabels <- c("none","last week","last 3
month","last year","regular use
at least 3 months","unknown length of usage")

for (i in drugvar)
   X[i] <- factor(X[i], levels = 0:5, labels= mylabels)

(Or X[,drugvar] but single index will extract the column as well.)

Or, using a more advanced idiom:

X[drugvar] <- lapply(X[drugvar], factor, levels=0:5, labels=mylabels)

>
----------------------------------------------------------------------------------------------------
> 5) I had more success creating new variables based on the old ones.  So I
> end up with yes/no answers to drug usage
> 
> for (i in 24:56)
> {
>   X[,i+173] <- ifelse(X[,i] >0,c(1),c(0))
> }
(Don't use c(0). Not that it is that harmful, it is just unnecessary and
labels yourself as a newbie...).

I'd write the ifelse() bit as as.numeric(X[,i] > 0), and the whole thing
is very close to

X <- cbind(X, as.numeric(X[24:56] > 0))

except for colnames issues,
> 
> I'd like to have been able to make a new variable name based off of the
old
> variable name (i.e. dropping "_when" from the end of each and
replace it
> with "_yn")

sub() is your friend:

Z <- as.data.frame(as.numeric(X[24:56]>0))
names(Z) <- sub("_when$", "_yn", names(Z))
X <- cbind(X, Z)
> 
>
---------------------------------------------------------------------------------------------------
>
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
---------------------------------------------------------------------------------------------------
> 6)  I'm able to make a cross-tabulated table and perform a X-squared
test
> just fine with my recoded variable
> 
> table(X$race,X[,197])
> prop.test(table(X$race,X[,197]))
> 
> but I would like to be able to do so with all of my drugs, although I
can't
> seem to make that work
> 
> for (i in 197:229)
> {
>   table(X$race,X[,i])
>   prop.test(table(X$race,X[,i]))
> }
That's basically fine, just remember to print() the results when they
are generated in a loop.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Johannes Huesing

2010-Oct-03 12:47 UTC

head link

[R] Non-Parametric Adventures in R

Jamesp <james.jrp015 at gmail.com> [Sat, Oct 02, 2010 at 11:27:09PM
CEST]:> 
[...]>
----------------------------------------------------------------------------------------------------
> 1) I was thinking I'd have to go through each nominal variable (i.e.
> table(X$race) ), but I think I have it figured out now.  summary(X) is
nice,
> but I need to recode nominal data with labels so the results are
meaningful.
> 
Labels are not a concept which comes with R-base. You may want to try
the Hmisc package and the label and describe functions. Unfortunately,
reporting functions in R-base make no use of labels.
>
-----------------------------------------------------------------------------------------------------
> 2) I had an issue with multiple plots overwriting each other, and I managed
> to bypass that with:
> par(mfrow=c(2,1))
> I have to update it to correspond to the number of plots I think. 
There's
> probably a better way to do this.
> 
Try for example
pdf("yourfilename.pdf")

 ... plotting routines ...

dev.off()

R does not provide a graphics browser by itself, only one graphic window,
so you may want to use the capabilities of external programs such as
your favourite pdf viewer.
> barplot(table(X$race))  prints out a barplot so that's great 
plot(table(numeric variable)) draws barplots with scaled x axis, which I
think is even greater when looking at integer random variables.
> 
>
-----------------------------------------------------------------------------------------------------
> 3) I was able to code my data so it shows up in tables better with
> X$race <- factor(X$race, levels = c(0,2), labels = c("African
> American","White,Non-Hispanic"))
> 
>
----------------------------------------------------------------------------------------------------
>
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
----------------------------------------------------------------------------------------------------
> 4) The coding for all of my drug variables is identical, and I'd like
to
> create a loop that goes through and labels accordingly
> 
Cycle over the column names, one example:

x <- data.frame(replicate(8, sample(as.factor(c("Black",
"Asian", "White", "Hispanic",
"Native")),
                                    20, replace=TRUE)))

for (col in c("X2", "X3", "X4")) { 
    levels(x[[col]])[c(2, 5)] <- c("African American", "White,
non-Hispanic") }

Generally, the use of loops is not encouraged. Here it is a simple thing 
to do as you need the modification of x as a side effect.
>
----------------------------------------------------------------------------------------------------
> 5) I had more success creating new variables based on the old ones.  So I
> end up with yes/no answers to drug usage
> 
> for (i in 24:56)
> {
>   X[,i+173] <- ifelse(X[,i] >0,c(1),c(0))
> }
> 
> I'd like to have been able to make a new variable name based off of the
old
> variable name (i.e. dropping "_when" from the end of each and
replace it
> with "_yn")
> 
untested, but along these lines (pls provide a small data example with
your questions so they can be addressed more directly):

for (col in grep("_when$", colnames(X))) {
    X[, sub("_when$", "_yn")] <- ifelse(X[, col] > 0,
1, 0)
}

if you insist on coding your _yn variables as numeric. In R, the data
type boolean exists, so it would be more idiomatic to simply have
X[, col] > 0 without the ifelse() construct.
>
---------------------------------------------------------------------------------------------------
>
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
---------------------------------------------------------------------------------------------------
> 6)  I'm able to make a cross-tabulated table and perform a X-squared
test
> just fine with my recoded variable
> 
> table(X$race,X[,197])
> prop.test(table(X$race,X[,197]))
> 
> but I would like to be able to do so with all of my drugs, although I
can't
> seem to make that work
> 
> for (i in 197:229)
> {
>   table(X$race,X[,i])
>   prop.test(table(X$race,X[,i]))
> }
> 
in my toy example:

apply(x[, -1], 2, function(vec) fisher.test(table(x[, 1], vec)))

Note the non-use of a loop here, the upside being that a list
of test results is returned (which you'd have to build yourself
if using a loop). I couldn't apply a prop test here as I didn't
have vectors of trials and successes, and I wonder how you got
them out of your table() function.

If you don't understand each single command, type ?commandname.
If you have any further questions after reading up on the 
descriptions, feel free to post them here, but please provide
toy examples of your own.
-- 
Johannes H?sing               There is something fascinating about science. 
                              One gets such wholesale returns of conjecture 
mailto:johannes at huesing.name  from such a trifling investment of fact.
http://derwisch.wikidot.com         (Mark Twain, "Life on the
Mississippi")

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Oct 2010 - Non-Parametric Adventures in R

[R] Non-Parametric Adventures in R

[R] Non-Parametric Adventures in R

[R] Non-Parametric Adventures in R

[R] Non-Parametric Adventures in R

Possibly Parallel Threads