I just started using R and I'm having all sorts of "fun" trying different things. I'm going to document the different things I'm doing here as a kind of case study. I'm hoping that I'll get help from the community so that I can use R properly. Anyways, in this study, I have demographic data, drug usage data, and side effect data. All of this is loaded into a csv file. I'm using Rweb as an interface, so I had to modify the cgi-bin code slightly, but it works pretty well. I'm looking for frequency counts, some summary data for columns where it makes sense, plots and X-squared tests. My data frame is named X since that's what Rweb names it. ---------------------------------------------------------------------------------------------------- 1) I was thinking I'd have to go through each nominal variable (i.e. table(X$race) ), but I think I have it figured out now. summary(X) is nice, but I need to recode nominal data with labels so the results are meaningful. ----------------------------------------------------------------------------------------------------- 2) I had an issue with multiple plots overwriting each other, and I managed to bypass that with: par(mfrow=c(2,1)) I have to update it to correspond to the number of plots I think. There's probably a better way to do this. barplot(table(X$race)) prints out a barplot so that's great ----------------------------------------------------------------------------------------------------- 3) I was able to code my data so it shows up in tables better with X$race <- factor(X$race, levels = c(0,2), labels = c("African American","White,Non-Hispanic")) ---------------------------------------------------------------------------------------------------- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ---------------------------------------------------------------------------------------------------- 4) The coding for all of my drug variables is identical, and I'd like to create a loop that goes through and labels accordingly I'm not having good success with this yet, but here's what I'm trying. X[1,] <- factor(X[1,], levels = c(0,1,2,3,4,5), labels= c("none","last week","last 3 month","last year","regular use at least 3 months","unknown length of usage")) I know I would need to replace the [1,] with something that gives me the column, but I'm not sure what to put syntactically at the moment. ---------------------------------------------------------------------------------------------------- 5) I had more success creating new variables based on the old ones. So I end up with yes/no answers to drug usage for (i in 24:56) { X[,i+173] <- ifelse(X[,i] >0,c(1),c(0)) } I'd like to have been able to make a new variable name based off of the old variable name (i.e. dropping "_when" from the end of each and replace it with "_yn") --------------------------------------------------------------------------------------------------- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! --------------------------------------------------------------------------------------------------- 6) I'm able to make a cross-tabulated table and perform a X-squared test just fine with my recoded variable table(X$race,X[,197]) prop.test(table(X$race,X[,197])) but I would like to be able to do so with all of my drugs, although I can't seem to make that work for (i in 197:229) { table(X$race,X[,i]) prop.test(table(X$race,X[,i])) } ------------------------------------------------------------------------------------------------- Thanks for reading over this and I do appreciate any help. I understand that there's "an R way" of doing things, and I look forward to learning the method. -- View this message in context: http://r.789695.n4.nabble.com/Non-Parametric-Adventures-in-R-tp2952754p2952754.html Sent from the R help mailing list archive at Nabble.com.
Dear Jamesp, This might be (more?) fitting for a blog then the R-help mailing list. I'd suggest you to open a blog on (it takes less then 4 minutes): wordpress.com It now has syntax highlighting for R code: http://www.r-statistics.com/2010/09/r-syntax-highlighting-for-bloggers-on-wordpress-com/ I also combined a list of tips for the R blogger <http://r-bloggers.com/>, on this post: http://www.r-statistics.com/2010/07/blogging-about-r-presentation-and-audio/ Cheers, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Sat, Oct 2, 2010 at 11:27 PM, Jamesp <james.jrp015@gmail.com> wrote:> > I just started using R and I'm having all sorts of "fun" trying different > things. > > I'm going to document the different things I'm doing here as a kind of case > study. I'm hoping that I'll get help from the community so that I can use > R > properly. > > Anyways, in this study, I have demographic data, drug usage data, and side > effect data. All of this is loaded into a csv file. I'm using Rweb as an > interface, so I had to modify the cgi-bin code slightly, but it works > pretty > well. I'm looking for frequency counts, some summary data for columns > where > it makes sense, plots and X-squared tests. My data frame is named X since > that's what Rweb names it. > > > ---------------------------------------------------------------------------------------------------- > 1) I was thinking I'd have to go through each nominal variable (i.e. > table(X$race) ), but I think I have it figured out now. summary(X) is > nice, > but I need to recode nominal data with labels so the results are > meaningful. > > > ----------------------------------------------------------------------------------------------------- > 2) I had an issue with multiple plots overwriting each other, and I managed > to bypass that with: > par(mfrow=c(2,1)) > I have to update it to correspond to the number of plots I think. There's > probably a better way to do this. > > barplot(table(X$race)) prints out a barplot so that's great > > > ----------------------------------------------------------------------------------------------------- > 3) I was able to code my data so it shows up in tables better with > X$race <- factor(X$race, levels = c(0,2), labels = c("African > American","White,Non-Hispanic")) > > > ---------------------------------------------------------------------------------------------------- > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > ---------------------------------------------------------------------------------------------------- > 4) The coding for all of my drug variables is identical, and I'd like to > create a loop that goes through and labels accordingly > > I'm not having good success with this yet, but here's what I'm trying. > > X[1,] <- factor(X[1,], levels = c(0,1,2,3,4,5), labels= c("none","last > week","last 3 month","last year","regular use at least 3 months","unknown > length of usage")) > > I know I would need to replace the [1,] with something that gives me the > column, but I'm not sure what to put syntactically at the moment. > > > ---------------------------------------------------------------------------------------------------- > 5) I had more success creating new variables based on the old ones. So I > end up with yes/no answers to drug usage > > for (i in 24:56) > { > X[,i+173] <- ifelse(X[,i] >0,c(1),c(0)) > } > > I'd like to have been able to make a new variable name based off of the old > variable name (i.e. dropping "_when" from the end of each and replace it > with "_yn") > > > --------------------------------------------------------------------------------------------------- > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > --------------------------------------------------------------------------------------------------- > 6) I'm able to make a cross-tabulated table and perform a X-squared test > just fine with my recoded variable > > table(X$race,X[,197]) > prop.test(table(X$race,X[,197])) > > but I would like to be able to do so with all of my drugs, although I can't > seem to make that work > > for (i in 197:229) > { > table(X$race,X[,i]) > prop.test(table(X$race,X[,i])) > } > > > ------------------------------------------------------------------------------------------------- > > Thanks for reading over this and I do appreciate any help. I understand > that there's "an R way" of doing things, and I look forward to learning the > method. > -- > View this message in context: > http://r.789695.n4.nabble.com/Non-Parametric-Adventures-in-R-tp2952754p2952754.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
> ---------------------------------------------------------------------------------------------------- > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > ---------------------------------------------------------------------------------------------------- > 4) The coding for all of my drug variables is identical, and I'd like to > create a loop that goes through and labels accordingly > > I'm not having good success with this yet, but here's what I'm trying. > > X[1,] <- factor(X[1,], levels = c(0,1,2,3,4,5), labels= c("none","last > week","last 3 month","last year","regular use at least 3 months","unknown > length of usage")) > > I know I would need to replace the [1,] with something that gives me the > column, but I'm not sure what to put syntactically at the moment.[I assume you meant X[,1] there] Well a for loop like in 5) is not out of reach, you just need to figure out what to loop over. It's probably neatest to do it by name, but you could also do it by number (and that may be more convenient if the drug variables are listed sequentially). drugvar <- c(5,7,9,13) --OR-- drugvar <- c("aspirin","warfarin", "heroin", "nicotine") in either case, mylabels <- c("none","last week","last 3 month","last year","regular use at least 3 months","unknown length of usage") for (i in drugvar) X[i] <- factor(X[i], levels = 0:5, labels= mylabels) (Or X[,drugvar] but single index will extract the column as well.) Or, using a more advanced idiom: X[drugvar] <- lapply(X[drugvar], factor, levels=0:5, labels=mylabels)> ---------------------------------------------------------------------------------------------------- > 5) I had more success creating new variables based on the old ones. So I > end up with yes/no answers to drug usage > > for (i in 24:56) > { > X[,i+173] <- ifelse(X[,i] >0,c(1),c(0)) > }(Don't use c(0). Not that it is that harmful, it is just unnecessary and labels yourself as a newbie...). I'd write the ifelse() bit as as.numeric(X[,i] > 0), and the whole thing is very close to X <- cbind(X, as.numeric(X[24:56] > 0)) except for colnames issues,> > I'd like to have been able to make a new variable name based off of the old > variable name (i.e. dropping "_when" from the end of each and replace it > with "_yn")sub() is your friend: Z <- as.data.frame(as.numeric(X[24:56]>0)) names(Z) <- sub("_when$", "_yn", names(Z)) X <- cbind(X, Z)> > --------------------------------------------------------------------------------------------------- > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > --------------------------------------------------------------------------------------------------- > 6) I'm able to make a cross-tabulated table and perform a X-squared test > just fine with my recoded variable > > table(X$race,X[,197]) > prop.test(table(X$race,X[,197])) > > but I would like to be able to do so with all of my drugs, although I can't > seem to make that work > > for (i in 197:229) > { > table(X$race,X[,i]) > prop.test(table(X$race,X[,i])) > }That's basically fine, just remember to print() the results when they are generated in a loop. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Jamesp <james.jrp015 at gmail.com> [Sat, Oct 02, 2010 at 11:27:09PM CEST]:>[...]> ---------------------------------------------------------------------------------------------------- > 1) I was thinking I'd have to go through each nominal variable (i.e. > table(X$race) ), but I think I have it figured out now. summary(X) is nice, > but I need to recode nominal data with labels so the results are meaningful. >Labels are not a concept which comes with R-base. You may want to try the Hmisc package and the label and describe functions. Unfortunately, reporting functions in R-base make no use of labels.> ----------------------------------------------------------------------------------------------------- > 2) I had an issue with multiple plots overwriting each other, and I managed > to bypass that with: > par(mfrow=c(2,1)) > I have to update it to correspond to the number of plots I think. There's > probably a better way to do this. >Try for example pdf("yourfilename.pdf") ... plotting routines ... dev.off() R does not provide a graphics browser by itself, only one graphic window, so you may want to use the capabilities of external programs such as your favourite pdf viewer.> barplot(table(X$race)) prints out a barplot so that's greatplot(table(numeric variable)) draws barplots with scaled x axis, which I think is even greater when looking at integer random variables.> > ----------------------------------------------------------------------------------------------------- > 3) I was able to code my data so it shows up in tables better with > X$race <- factor(X$race, levels = c(0,2), labels = c("African > American","White,Non-Hispanic")) > > ---------------------------------------------------------------------------------------------------- > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > ---------------------------------------------------------------------------------------------------- > 4) The coding for all of my drug variables is identical, and I'd like to > create a loop that goes through and labels accordingly >Cycle over the column names, one example: x <- data.frame(replicate(8, sample(as.factor(c("Black", "Asian", "White", "Hispanic", "Native")), 20, replace=TRUE))) for (col in c("X2", "X3", "X4")) { levels(x[[col]])[c(2, 5)] <- c("African American", "White, non-Hispanic") } Generally, the use of loops is not encouraged. Here it is a simple thing to do as you need the modification of x as a side effect.> ---------------------------------------------------------------------------------------------------- > 5) I had more success creating new variables based on the old ones. So I > end up with yes/no answers to drug usage > > for (i in 24:56) > { > X[,i+173] <- ifelse(X[,i] >0,c(1),c(0)) > } > > I'd like to have been able to make a new variable name based off of the old > variable name (i.e. dropping "_when" from the end of each and replace it > with "_yn") >untested, but along these lines (pls provide a small data example with your questions so they can be addressed more directly): for (col in grep("_when$", colnames(X))) { X[, sub("_when$", "_yn")] <- ifelse(X[, col] > 0, 1, 0) } if you insist on coding your _yn variables as numeric. In R, the data type boolean exists, so it would be more idiomatic to simply have X[, col] > 0 without the ifelse() construct.> --------------------------------------------------------------------------------------------------- > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > --------------------------------------------------------------------------------------------------- > 6) I'm able to make a cross-tabulated table and perform a X-squared test > just fine with my recoded variable > > table(X$race,X[,197]) > prop.test(table(X$race,X[,197])) > > but I would like to be able to do so with all of my drugs, although I can't > seem to make that work > > for (i in 197:229) > { > table(X$race,X[,i]) > prop.test(table(X$race,X[,i])) > } >in my toy example: apply(x[, -1], 2, function(vec) fisher.test(table(x[, 1], vec))) Note the non-use of a loop here, the upside being that a list of test results is returned (which you'd have to build yourself if using a loop). I couldn't apply a prop test here as I didn't have vectors of trials and successes, and I wonder how you got them out of your table() function. If you don't understand each single command, type ?commandname. If you have any further questions after reading up on the descriptions, feel free to post them here, but please provide toy examples of your own. -- Johannes H?sing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:johannes at huesing.name from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi")