Jim Lemon
2020-May-22 04:08 UTC
[R] how to show percentage of individuals for two groups on histogram?
Hi Ana,
My apologies for the pedestrian graphics, but it may help.
# a bit of fake data
aafd<-data.frame(FID=paste0("fam",1000:2739),
IID=paste0("G",1000,2739),FLASER=rep(1,1740),
PLASER=c(rep(1,892),rep(2,848)),
DIABDUR=sample(10:50,1740,TRUE),
HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
pheno=c(rep("control",892),rep("case",848)))
par(mfrow=c(2,1))
casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
par(mar=c(0,4,1,2))
barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
text(mean(barpos),23,
"Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
box()
par(mar=c(3,4,0,2))
barplot(100*controlhist,names.arg=names(controlpct),
space=0,ylab="Percentage",col="orange",ylim=c(0,25))
text(mean(barpos),23,
"Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
box()
Jim
On Fri, May 22, 2020 at 9:08 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:>
> the result would basically look something like this on in attach or
> the overlay of those two plots
>
>
> On Thu, May 21, 2020 at 5:23 PM Ana Marija <sokovic.anamarija at
gmail.com> wrote:
> >
> > Hello,
> >
> > I have a data frame like this:
> > > head(a)
> > FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno
> > 1 fam1000-03 G1000 1 1 38 10.2 1 control
> > 2 fam1001-03 G1001 1 1 15 7.3 1 control
> > 3 fam1003-03 G1003 1 2 17 7.0 1 case
> > 4 fam1005-03 G1005 1 1 36 7.7 1 control
> > 5 fam1009-03 G1009 1 1 23 7.6 1 control
> > 6 fam1052-03 G1052 1 1 32 7.3 1 control
> >
> > > dim(a)
> > [1] 1698 8
> >
> > I am doing histogram plot via:
> > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> > position="dodge")
> >
> > there is 848 who have "case" in pheno column and 892 who
have
> > "control" in pheno column.
> >
> > I would like to have on y-axis shown percentage of individuals which
> > have either "case" or "control" in pheno instead
of count.
> >
> > Please advise,
> > Ana
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Jim Lemon
2020-May-22 04:14 UTC
[R] how to show percentage of individuals for two groups on histogram?
Hi Ana, Just noticed a typo from a hasty cut-paste. Two lines should read: casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15)) controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15)) Jim On Fri, May 22, 2020 at 2:08 PM Jim Lemon <drjimlemon at gmail.com> wrote:> > Hi Ana, > My apologies for the pedestrian graphics, but it may help. > > # a bit of fake data > aafd<-data.frame(FID=paste0("fam",1000:2739), > IID=paste0("G",1000,2739),FLASER=rep(1,1740), > PLASER=c(rep(1,892),rep(2,848)), > DIABDUR=sample(10:50,1740,TRUE), > HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740), > pheno=c(rep("control",892),rep("case",848))) > par(mfrow=c(2,1)) > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15)) > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15)) > par(mar=c(0,4,1,2)) > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange", > space=0,ylab="Percentage",xaxt="n",ylim=c(0,25)) > text(mean(barpos),23, > "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96") > box() > par(mar=c(3,4,0,2)) > barplot(100*controlhist,names.arg=names(controlpct), > space=0,ylab="Percentage",col="orange",ylim=c(0,25)) > text(mean(barpos),23, > "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12") > box() > > Jim > > On Fri, May 22, 2020 at 9:08 AM Ana Marija <sokovic.anamarija at gmail.com> wrote: > > > > the result would basically look something like this on in attach or > > the overlay of those two plots > > > > > > On Thu, May 21, 2020 at 5:23 PM Ana Marija <sokovic.anamarija at gmail.com> wrote: > > > > > > Hello, > > > > > > I have a data frame like this: > > > > head(a) > > > FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno > > > 1 fam1000-03 G1000 1 1 38 10.2 1 control > > > 2 fam1001-03 G1001 1 1 15 7.3 1 control > > > 3 fam1003-03 G1003 1 2 17 7.0 1 case > > > 4 fam1005-03 G1005 1 1 36 7.7 1 control > > > 5 fam1009-03 G1009 1 1 23 7.6 1 control > > > 6 fam1052-03 G1052 1 1 32 7.3 1 control > > > > > > > dim(a) > > > [1] 1698 8 > > > > > > I am doing histogram plot via: > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5, > > > position="dodge") > > > > > > there is 848 who have "case" in pheno column and 892 who have > > > "control" in pheno column. > > > > > > I would like to have on y-axis shown percentage of individuals which > > > have either "case" or "control" in pheno instead of count. > > > > > > Please advise, > > > Ana > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
Eric Berger
2020-May-22 05:18 UTC
[R] how to show percentage of individuals for two groups on histogram?
Hi Ana, This is a very common question about ggplot. A quick search turns up lots of hits that answer your question. Here are a couple https://community.rstudio.com/t/trouble-scaling-y-axis-to-percentages-from-counts/42999 https://stackoverflow.com/questions/3695497/show-instead-of-counts-in-charts-of-categorical-variables>From reading those discussions, the following should work (untested)ggplot(a, aes(x = HBA1C, fill=pheno)) + geom_histogram(aes(y stat(density)), binwidth = 0.5) + scale_y_continuous(labels = scales::percent_format()) HTH, Eric On Fri, May 22, 2020 at 7:18 AM Jim Lemon <drjimlemon at gmail.com> wrote:> > Hi Ana, > Just noticed a typo from a hasty cut-paste. Two lines should read: > > casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15)) > controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15)) > > Jim > > On Fri, May 22, 2020 at 2:08 PM Jim Lemon <drjimlemon at gmail.com> wrote: > > > > Hi Ana, > > My apologies for the pedestrian graphics, but it may help. > > > > # a bit of fake data > > aafd<-data.frame(FID=paste0("fam",1000:2739), > > IID=paste0("G",1000,2739),FLASER=rep(1,1740), > > PLASER=c(rep(1,892),rep(2,848)), > > DIABDUR=sample(10:50,1740,TRUE), > > HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740), > > pheno=c(rep("control",892),rep("case",848))) > > par(mfrow=c(2,1)) > > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15)) > > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15)) > > par(mar=c(0,4,1,2)) > > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange", > > space=0,ylab="Percentage",xaxt="n",ylim=c(0,25)) > > text(mean(barpos),23, > > "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96") > > box() > > par(mar=c(3,4,0,2)) > > barplot(100*controlhist,names.arg=names(controlpct), > > space=0,ylab="Percentage",col="orange",ylim=c(0,25)) > > text(mean(barpos),23, > > "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12") > > box() > > > > Jim > > > > On Fri, May 22, 2020 at 9:08 AM Ana Marija <sokovic.anamarija at gmail.com> wrote: > > > > > > the result would basically look something like this on in attach or > > > the overlay of those two plots > > > > > > > > > On Thu, May 21, 2020 at 5:23 PM Ana Marija <sokovic.anamarija at gmail.com> wrote: > > > > > > > > Hello, > > > > > > > > I have a data frame like this: > > > > > head(a) > > > > FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno > > > > 1 fam1000-03 G1000 1 1 38 10.2 1 control > > > > 2 fam1001-03 G1001 1 1 15 7.3 1 control > > > > 3 fam1003-03 G1003 1 2 17 7.0 1 case > > > > 4 fam1005-03 G1005 1 1 36 7.7 1 control > > > > 5 fam1009-03 G1009 1 1 23 7.6 1 control > > > > 6 fam1052-03 G1052 1 1 32 7.3 1 control > > > > > > > > > dim(a) > > > > [1] 1698 8 > > > > > > > > I am doing histogram plot via: > > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5, > > > > position="dodge") > > > > > > > > there is 848 who have "case" in pheno column and 892 who have > > > > "control" in pheno column. > > > > > > > > I would like to have on y-axis shown percentage of individuals which > > > > have either "case" or "control" in pheno instead of count. > > > > > > > > Please advise, > > > > Ana > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Ana Marija
2020-May-22 14:45 UTC
[R] how to show percentage of individuals for two groups on histogram?
HI Jim,
Thank you so much for getting back to me I tried your codes and I got
this in attach,
I think the issue is in calculating percentage per groups (cases or controls)
par(mfrow=c(2,1))
casehist<-table(cut(a$HBA1C[a$pheno=="case"],breaks=0:15))
controlhist<-table(cut(a$HBA1C[a$pheno=="control"],breaks=0:15))
par(mar=c(0,4,1,2))
barpos=barplot(100*casehist,names.arg=names(casehist),col="orange",
space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
text(mean(barpos),23,
"Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
box()
par(mar=c(3,4,0,2))
barplot(100*controlhist,names.arg=names(controlhist),
space=0,ylab="Percentage",col="orange",ylim=c(0,25))
text(mean(barpos),23,
"Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
box()
I can send you the whole dataset if you would like to try with it
On Thu, May 21, 2020 at 11:14 PM Jim Lemon <drjimlemon at gmail.com>
wrote:>
> Hi Ana,
> Just noticed a typo from a hasty cut-paste. Two lines should read:
>
>
casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
>
controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
>
> Jim
>
> On Fri, May 22, 2020 at 2:08 PM Jim Lemon <drjimlemon at gmail.com>
wrote:
> >
> > Hi Ana,
> > My apologies for the pedestrian graphics, but it may help.
> >
> > # a bit of fake data
> > aafd<-data.frame(FID=paste0("fam",1000:2739),
> > IID=paste0("G",1000,2739),FLASER=rep(1,1740),
> > PLASER=c(rep(1,892),rep(2,848)),
> > DIABDUR=sample(10:50,1740,TRUE),
> > HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
> > pheno=c(rep("control",892),rep("case",848)))
> > par(mfrow=c(2,1))
> >
casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> >
controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> > par(mar=c(0,4,1,2))
> >
barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
> > space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
> > text(mean(barpos),23,
> > "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
> > box()
> > par(mar=c(3,4,0,2))
> > barplot(100*controlhist,names.arg=names(controlpct),
> >
space=0,ylab="Percentage",col="orange",ylim=c(0,25))
> > text(mean(barpos),23,
> > "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
> > box()
> >
> > Jim
> >
> > On Fri, May 22, 2020 at 9:08 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:
> > >
> > > the result would basically look something like this on in attach
or
> > > the overlay of those two plots
> > >
> > >
> > > On Thu, May 21, 2020 at 5:23 PM Ana Marija <sokovic.anamarija
at gmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have a data frame like this:
> > > > > head(a)
> > > > FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno
> > > > 1 fam1000-03 G1000 1 1 38 10.2 1 control
> > > > 2 fam1001-03 G1001 1 1 15 7.3 1 control
> > > > 3 fam1003-03 G1003 1 2 17 7.0 1 case
> > > > 4 fam1005-03 G1005 1 1 36 7.7 1 control
> > > > 5 fam1009-03 G1009 1 1 23 7.6 1 control
> > > > 6 fam1052-03 G1052 1 1 32 7.3 1 control
> > > >
> > > > > dim(a)
> > > > [1] 1698 8
> > > >
> > > > I am doing histogram plot via:
> > > > ggplot(a, aes(x=HBA1C, fill=pheno)) +
geom_histogram(binwidth=.5,
> > > > position="dodge")
> > > >
> > > > there is 848 who have "case" in pheno column and
892 who have
> > > > "control" in pheno column.
> > > >
> > > > I would like to have on y-axis shown percentage of
individuals which
> > > > have either "case" or "control" in pheno
instead of count.
> > > >
> > > > Please advise,
> > > > Ana
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2020-05-22 at 9.42.01 AM.png
Type: image/png
Size: 88187 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20200522/bb7b7549/attachment.png>
Jim Lemon
2020-May-23 00:03 UTC
[R] how to show percentage of individuals for two groups on histogram?
Hi Ana,
I think this is what you want in the panel style of plot. Let me know
if not, or if I have calculated the wrong percentages. The overlaid
histograms definitely use a different calculation.
amsdf<-read.table("pheno_m1_plot",header=TRUE,stringsAsFactors=FALSE)
dim(amsdf)
# find the right breaks for your "cut"
casen<-table(cut(amsdf$HBA1C[amsdf$pheno==2],breaks=3:14))
controln<-table(cut(amsdf$HBA1C[amsdf$pheno==1],breaks=3:14))
# save yourself some typing
HBA1C2<-amsdf$HBA1C[amsdf$pheno==2]
HBA1C1<-amsdf$HBA1C[amsdf$pheno==1]
ncases<-length(HBA1C2)
ncontrols<-length(HBA1C1)
split.screen(matrix(c(0,1,0.6,1,0,1,0,0.6),nrow=2,byrow=TRUE))
par(mar=c(0,4,1,2))
barpos=barplot(100*casen/ncases,names.arg=NA,col="orange",
space=0,ylab="Percentage",xaxt="n",ylim=c(0,27))
case_text<-sprintf(
"Cases: n=%d, nulls=%d, median=%.1f, mean=%.1f, sd=%.1f",
length(HBA1C2),sum(is.na(HBA1C2)),round(median(HBA1C2,na.rm=TRUE),1),
round(mean(HBA1C2,na.rm=TRUE),1),round(sd(HBA1C2,na.rm=TRUE),1))
text(mean(barpos),25,case_text)
box()
screen(2)
par(mar=c(4,4,0,2))
barplot(100*controln/ncontrols,names.arg=NA,
space=0,ylab="Percentage",col="orange",ylim=c(0,34))
control_text<-sprintf(
"Cases: n=%d, nulls=%d, median=%.1f, mean=%.1f, sd=%.1f",
length(HBA1C1),sum(is.na(HBA1C1)),round(median(HBA1C1,na.rm=TRUE),1),
round(mean(HBA1C1,na.rm=TRUE),1),round(sd(HBA1C1,na.rm=TRUE),1))
text(mean(barpos),32,control_text)
box()
library(plotrix)
staxlab(1,at=barpos,labels=names(casen))
Jim
On Sat, May 23, 2020 at 9:01 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:>
> Hi Jim,
>
> My data is attached. It is most kind of you for looking into this!
>
> Cheers,
> Ana
>
> On Fri, May 22, 2020 at 5:49 PM Jim Lemon <drjimlemon at gmail.com>
wrote:
> >
> > Hi Ana,
> > As I had very little idea what your data looked like, what I made up
> > obviously didn't fit in the plot that well. If you can send the
data I
> > can make a better attempt. The other thing is whether you want a plot
> > with two adjacent panels (what I sent) or overlaid histograms (what
> > Eric sent). Let me know.
> >
> > Jim
> >
> > On Sat, May 23, 2020 at 12:45 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:
> > >
> > > HI Jim,
> > >
> > > Thank you so much for getting back to me I tried your codes and I
got
> > > this in attach,
> > > I think the issue is in calculating percentage per groups (cases
or controls)
> > > ...
> > > I can send you the whole dataset if you would like to try with it
> > > On Thu, May 21, 2020 at 11:14 PM Jim Lemon <drjimlemon at
gmail.com> wrote:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ams1.png
Type: image/png
Size: 18251 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20200523/ee75ea08/attachment.png>