Vining, Kelly
2011-Dec-09 20:05 UTC
[R] Fixed! Thanks all:RE: scatterplot to boxplot translation?
Thanks to David and Jorge - both of your helpful suggestions got me to the desired endpoint. In case anyone else has this question: I boxplotted my y variable data, but did the "cut" operation on the x variable in order to conserve the order of the y data. I see another suggestion coming in from another user that basically says this. So, my working line of code was: boxplot(count$RPKM ~ cut(count$C_count, breaks=4) Much appreciation to everyone who responded...thanks for helping with a na?ve question without making me feel stupid. This discussion board is very, very good. --Kelly V. -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Friday, December 09, 2011 11:58 AM To: Uwe Ligges Cc: Vining, Kelly; r-help at r-project.org Subject: Re: [R] scatterplot to boxplot translation? On Dec 9, 2011, at 2:50 PM, Uwe Ligges wrote:> > > On 09.12.2011 20:41, Vining, Kelly wrote: >> Thanks for the tip on "cut," seems like it should work. I must still >> be missing something, though. Here, I'm cutting on the y variable, >> then attempting the boxplot: >> >> cutRPKM<- cut(count$RPKM, breaks=4) >> >> head(cutRPKM) >> [1] (-0.0995,24.8] (-0.0995,24.8] (-0.0995,24.8] (-0.0995,24.8] >> (-0.0995,24.8] [6] (-0.0995,24.8] >> Levels: (-0.0995,24.8] (24.8,49.8] (49.8,74.7] (74.7,99.6] >> >> boxplot(as.numeric(cutRPKM)) >> >> This gives me a single box instead of five boxes. ?? > > > You obviously want: > > boxplot(count$RPKM ~ cut(count$RPKM, breaks=seq(0, max(count$RPKM), > by=100)))In that context (having defined a cut-variable with single-integer break argument), would have thought this should work: boxplot(count$RPKM ~ cutRPKM) -- David.> > > Uwe Ligges > > >> Thanks again, >> --Kelly V. >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Friday, December 09, 2011 11:14 AM >> To: Vining, Kelly >> Cc: r-help at r-project.org >> Subject: Re: [R] scatterplot to boxplot translation? >> >> On Dec 9, 2011, at 2:11 PM, Vining, Kelly wrote: >> >>> My apologies if anyone is seeing this twice...looks like my previous >>> message didn't come through... >>> >>> Dear UseRs, >>> I have a feeling this is a relatively simple question, but I'm >>> having a hard time getting my head around it. I have a simple x-y >>> scatterplot with many points, as shown below(attached). I'd like to >>> make a boxplot of this by interval, such that there is one box >>> representing the points in the 0-100 interval, one for the 101-200 >>> interval, and so on. How do I structure my R data frame to be able >>> to generate such a boxplot? >>> >> >> ?cut >> >>> >>> From: r-help-bounces at r-project.org >>> [mailto:r-help-bounces at r-project.org >>> ] On Behalf Of Vining, Kelly >>> Sent: Friday, December 09, 2011 11:01 AM >>> To: r-help at r-project.org >>> Subject: [R] scatterplot to boxplot translation? >>> >>> >>> <C_count_vs_RPKM.png>______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Bert Gunter
2011-Dec-09 21:23 UTC
[R] Fixed! Thanks all:RE: scatterplot to boxplot translation?
Kelly: Glad you got what you were looking for, but this whole thread begs the question; (Why) should you do this? You lose information in binning the continuous data, of course. Perhaps your answer is that the point scatter in the data is too noisy to clearly discern what's going on, a legitimate response. One might then -- or in general -- consider overlaying a fitted smooth (nonparameteric) curve to the data to reveal the "trend." There are a zillion ways to do this in R: both lattice and ggplot have built-in capabilities to do this easily, as does base R with ?scatter.smooth. If that's too easy, you can do it by hand via ?lowess (or it's more flexible cousin, ?loess), smooth.spline, etc. In actuality, your binning strategy is a crude, non-smooth version of such smoothing, so it's not that far-fetched. Or as some of the choicer R-Help pages say, cutting and boxplotting is to smoothing as histograms are to nonparametric density estimates. Cheers, Bert On Fri, Dec 9, 2011 at 12:05 PM, Vining, Kelly <Kelly.Vining at oregonstate.edu> wrote:> Thanks to David and Jorge - both of your helpful suggestions got me to the desired endpoint. In case anyone else has this question: I boxplotted my y variable data, but did the "cut" operation on the x variable in order to conserve the order of the y data. I see another suggestion coming in from another user that basically says this. > > So, my working line of code was: > > boxplot(count$RPKM ~ cut(count$C_count, breaks=4) > > Much appreciation to everyone who responded...thanks for helping with a na?ve question without making me feel stupid. > > This discussion board is very, very good. > > --Kelly V. > > -----Original Message----- > From: David Winsemius [mailto:dwinsemius at comcast.net] > Sent: Friday, December 09, 2011 11:58 AM > To: Uwe Ligges > Cc: Vining, Kelly; r-help at r-project.org > Subject: Re: [R] scatterplot to boxplot translation? > > > On Dec 9, 2011, at 2:50 PM, Uwe Ligges wrote: > >> >> >> On 09.12.2011 20:41, Vining, Kelly wrote: >>> Thanks for the tip on "cut," seems like it should work. I must still >>> be missing something, though. Here, I'm cutting on the y variable, >>> then attempting the boxplot: >>> >>> cutRPKM<- cut(count$RPKM, breaks=4) >>> >>> head(cutRPKM) >>> [1] (-0.0995,24.8] (-0.0995,24.8] (-0.0995,24.8] (-0.0995,24.8] >>> (-0.0995,24.8] [6] (-0.0995,24.8] >>> Levels: (-0.0995,24.8] (24.8,49.8] (49.8,74.7] (74.7,99.6] >>> >>> boxplot(as.numeric(cutRPKM)) >>> >>> This gives me a single box instead of five boxes. ?? >> >> >> You obviously want: >> >> boxplot(count$RPKM ~ cut(count$RPKM, breaks=seq(0, max(count$RPKM), >> by=100))) > > In that context (having defined a cut-variable with single-integer break argument), ?would have thought this should work: > > ?boxplot(count$RPKM ~ cutRPKM) > > -- > David. > >> >> >> Uwe Ligges >> >> >>> Thanks again, >>> --Kelly V. >>> ________________________________________ >>> From: David Winsemius [dwinsemius at comcast.net] >>> Sent: Friday, December 09, 2011 11:14 AM >>> To: Vining, Kelly >>> Cc: r-help at r-project.org >>> Subject: Re: [R] scatterplot to boxplot translation? >>> >>> On Dec 9, 2011, at 2:11 PM, Vining, Kelly wrote: >>> >>>> My apologies if anyone is seeing this twice...looks like my previous >>>> message didn't come through... >>>> >>>> Dear UseRs, >>>> I have a feeling this is a relatively simple question, but I'm >>>> having a hard time getting my head around it. I have a simple x-y >>>> scatterplot with many points, as shown below(attached). I'd like to >>>> make a boxplot of this by interval, such that there is one box >>>> representing the points in the 0-100 interval, one for the 101-200 >>>> interval, and so on. How do I structure my R data frame to be able >>>> to generate such a boxplot? >>>> >>> >>> ?cut >>> >>>> >>>> From: r-help-bounces at r-project.org >>>> [mailto:r-help-bounces at r-project.org >>>> ] On Behalf Of Vining, Kelly >>>> Sent: Friday, December 09, 2011 11:01 AM >>>> To: r-help at r-project.org >>>> Subject: [R] scatterplot to boxplot translation? >>>> >>>> >>>> <C_count_vs_RPKM.png>______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius, MD >>> West Hartford, CT >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm