Dear Jeff, thank you for your email. Yes, in order to be more descriptive/comprehensive, please find attached to my email the following files (my apologies ... I am sending these as attachments, as I do not have a web server running at this moment) : -- the R script (R_script_display_ECDF.R) that reads the file "LENGTH" and outputs ECDF figure by using the standard R function or ggplot2. -- the display of ECDF by using standard R function ("display.R.ecdf.LENGTH.pdf") -- the display of ECDF by using ggplot2 ("display.ggplot2.ecdf.LENGTH.pdf") The ECDF over xlim(0,500) looks very different (contrasting plot(ecdf) vs ggplot2). Please would you advise why ? what shall I change in my ggplot2 code ? thanks a lot, - bogdan ps : the R code is also written below : library("ggplot2")>> file <- read.delim("LENGTH", sep="\t", header=T, stringsAsFactors=F) >> ############################# display with PLOT FUNCTION: >> pdf("display.R.ecdf.LENGTH.pdf", width=10, height=6, paper='special') >> plot(ecdf(file$LENGTH), xlab="DEL SIZE", > ylab="fraction of DEL", > main="LENGTH of DEL", > xlim=c(0,500), > col = "dark red", axes = FALSE) >> ticks_y <- c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4) >> axis(2, at=ticks_y, labels=ticks_y, col.axis="red") >> ticks_x <- c(0, 100, 200, 400, 500, 600, 700, 800) >> axis(1, at=ticks_x, labels=ticks_x, col.axis="blue") >> dev.off() >> ############################# display in GGPLOT2 : >> BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, > 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000) >> barfill <- "#4271AE" > barlines <- "#1F3552" >> pdf("display.ggplot2.ecdf.LENGTH.pdf", width=10, height=6, > paper='special') >> ggplot(file, aes(LENGTH)) + > stat_ecdf(geom = "point", colour = barlines, fill = barfill) + > scale_x_continuous(name = "LENGTH of DEL", > breaks = BREAKS, > limits=c(0, 500)) + > scale_y_continuous(name = "FRACTION") + > ggtitle("ECDF of LENGTH") + > theme_bw() + > theme(legend.position = "bottom", legend.direction > "horizontal", > legend.box = "horizontal", > legend.key.size = unit(1, "cm"), > axis.title = element_text(size = 12), > legend.text = element_text(size = 9), > legend.title=element_text(face = "bold", size = 9)) >> dev.off()On Sat, Jul 7, 2018 at 9:47 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> It is a feature of ggplot that points excluded by limits raise warnings, > while base graphics do not. > > You may find that using coord_cartesian with the xlim=c(0,500) argument > works better with ggplot by showing the consequences of points out of the > limits on lines within the viewport. > > There are other possible problems with your data that your > non-reproducible example does not show, and sending R code in > HTML-formatted email usually corrupts it.. so please follow the > recommendations in the Posting Guide next time you post. > > On July 6, 2018 4:32:41 PM PDT, Bogdan Tanasa <tanasa at gmail.com> wrote: > >Dear all, > > > >I would appreciate having your advice/suggestions/comments on the > >following > >: > > > >1 -- starting from a vector that contains LENGTHS (numerically, the > >values > >are from 1 to 10 000) > > > >2 -- shall I display the ECDF by using the R code and some "limits" : > > > >BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, > >500, > > 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000) > > > >ggplot(x, aes(LENGTH)) + > > stat_ecdf(geom = "point") + > > scale_x_continuous(name = "LENGTH of DEL", > > breaks = BREAKS, > > limits=c(0, 500)) > > > >3 -- I am getting the following warning message : "Warning message: > >Removed > >109 rows containing non-finite values (stat_ecdf)." > > > >The question is : are these 109 values removed from VISUALIZATION as i > >set > >up the "limits", or are these 109 values removed from statistical > >CALCULATION? > > > >4 -- in contrast, shall I use the standard R functions plot(ecdf), > >there is > >no "warning mesage" > > > >plot(ecdf(x$LENGTH), xlab="DEL LENGTH", > > ylab="Fraction of DEL", main="DEL", xlim=c(0,500), > > col = "dark red") > > > >Thanks a lot ! > > > >-- bogdan > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. >-------------- next part -------------- A non-text attachment was scrubbed... Name: display.ggplot2.ecdf.LENGTH.pdf Type: application/pdf Size: 8841 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20180708/75da9c56/attachment-0004.pdf> -------------- next part -------------- A non-text attachment was scrubbed... Name: display.R.ecdf.LENGTH.pdf Type: application/pdf Size: 13600 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20180708/75da9c56/attachment-0005.pdf>
Thank you for making the effort... but most attachments get stripped on the mailing list. Using the reprex package as I suggested and putting the result into the email is by far the safest approach. Since I received your email directly, I did get the attachments. Below is my reproducible example... to serve as an example for how you can get help from everyone on the list rather than just the few you are responding to. My summary comment is that you have to decide whether the LENGTH values greater than 500 are relevant... and if they are, you REALLY SHOULD create a data set that is limited in this fashion. Then you won't have to create "fake" axes, and you won't get ggplot warnings. Note: The reprex package allows you to confirm that the example is in fact reproducible, so technically it is not necessary to include the plot images in the question. However, reprex used to conveniently support putting the images on the imgur website, and for some reason it no longer does that, so just run the example interactively to see the graphs. ####### ############################################################ ############################################################ library("ggplot2") # "file" is the name of a very fundamental function in base R. Re-using # that name for a data value is at best confusing to anyone reading your # code and at worst will prevent you from using that function. #file <- read.delim("LENGTH", sep="\t", header=T, stringsAsFactors=F) # Instead of giving us a file, keep the data within the example # DF <- read.delim("LENGTH", sep="\t", header=T, stringsAsFactors=F) # set.seed( 42 ) # also shrink the size of the data for the example... we almost # never need all of it # dput( DF[ sample( seq.int( nrow( DF ) ), size = 200 ), , drop=FALSE ] ) DF <- structure(list(LENGTH = c(6813L, 56035L, 123997L, 281L, 851L, 1072L, 72196L, 21L, 304L, 110L, 198L, 5922L, 283L, 199348L, 109L, 3317104L, 106L, 37642146L, 82641L, 20L, 125911L, 354L, 11625388L, 330L, 9811711L, 18L, 35L, 39897L, 27L, 277L, 79L, 2657L, 17L, 26L, 23L, 248L, 3634L, 21L, 324L, 206L, 328L, 42L, 286L, 6042409L, 24L, 36L, 2879L, 18L, 301L, 90684L, 4296636L, 43L, 1222L, 4536L, 3281L, 324L, 393L, 3754L, 98824541L, 459L, 18L, 1081L, 175L, 970L, 17L, 219L, 235558L, 1167315L, 25L, 623L, 2517515L, 32L, 217L, 29L, 17L, 1744L, 18L, 39L, 26L, 77L, 41L, 22L, 311L, 119015225L, 146413L, 22L, 19L, 301L, 373L, 2240L, 6439L, 128L, 18L, 257L, 783L, 5169L, 31608038L, 325L, 1533L, 25L, 69344L, 54L, 10651L, 31L, 335062L, 1854019L, 7153L, 38605567L, 51L, 23L, 16L, 301L, 79L, 313L, 18L, 29L, 39L, 22L, 17L, 306L, 67L, 280L, 324L, 158L, 93L, 2561L, 302L, 134578L, 328L, 9002L, 969051L, 34L, 20L, 309L, 355L, 28L, 9461327L, 18627013L, 305L, 64L, 18L, 2730L, 28L, 246L, 911L, 28L, 241483L, 154691L, 58891L, 55L, 456362L, 281L, 276L, 51L, 26L, 106821L, 313L, 78L, 29L, 400L, 61171382L, 200L, 101L, 220331L, 128L, 325L, 28L, 22L, 325L, 2330L, 5879L, 24L, 36L, 23L, 51L, 26L, 32584707L, 1672L, 13939L, 315L, 20L, 580785L, 42795L, 49193543L, 695L, 48568156L, 55634L, 207L, 318L, 22056L, 3670420L, 4815387L, 309L, 17L, 3143160L, 431L, 1164L, 33L, 5503L, 4166L)), .Names = "LENGTH", row.names = c(8283L, 8484L, 2591L, 7517L, 5808L, 4698L, 6665L, 1219L, 5944L, 6378L, 4140L, 6503L, 8452L, 2310L, 4180L, 8497L, 8842L, 1062L, 4293L, 5063L, 8168L, 1253L, 8932L, 8550L, 745L, 4643L, 3523L, 8177L, 4035L, 7545L, 6657L, 7319L, 3502L, 6181L, 36L, 7513L, 67L, 1873L, 8174L, 5516L, 3422L, 3928L, 338L, 8773L, 3891L, 8627L, 7997L, 5765L, 8745L, 5573L, 3003L, 3122L, 3588L, 7064L, 351L, 6739L, 6095L, 1541L, 2349L, 4628L, 6077L, 8839L, 6830L, 5094L, 7639L, 1704L, 2439L, 7443L, 6230L, 2162L, 387L, 1262L, 1944L, 4306L, 1773L, 6460L, 71L, 3371L, 4618L, 15L, 5220L, 1417L, 3222L, 5792L, 6960L, 5056L, 2096L, 807L, 768L, 2737L, 5983L, 3L, 1870L, 8361L, 8294L, 6577L, 2984L, 4614L, 6664L, 5545L, 5608L, 1945L, 1939L, 3482L, 8435L, 8615L, 6621L, 6561L, 4793L, 21L, 5447L, 7484L, 6721L, 4048L, 4790L, 4804L, 13L, 3179L, 5471L, 7407L, 3187L, 3669L, 5123L, 5267L, 6427L, 3527L, 8207L, 8593L, 2085L, 6467L, 8065L, 5385L, 5635L, 8363L, 7587L, 5172L, 7326L, 1015L, 6817L, 5560L, 1324L, 716L, 4136L, 6945L, 6536L, 7281L, 1516L, 8415L, 2616L, 1328L, 6406L, 2886L, 6933L, 3511L, 6040L, 6905L, 1672L, 259L, 1208L, 6051L, 8315L, 4896L, 5351L, 1752L, 4759L, 1597L, 4017L, 2818L, 1033L, 1654L, 6483L, 3659L, 3678L, 4266L, 3797L, 1212L, 7322L, 5258L, 7052L, 6826L, 8147L, 7655L, 2813L, 2300L, 6584L, 6629L, 8140L, 7034L, 1183L, 2551L, 1726L, 6950L, 1143L, 1144L, 641L, 471L, 4712L, 995L, 6582L, 6476L), class = "data.frame") ############################# display with PLOT FUNCTION: # saving files should be avoided in reproducible examples... especially files # that cannot be transmitted through the R-help mailing list such as pdf files #pdf("display.R.ecdf.LENGTH.pdf", width=10, height=6, paper='special') # Your original plot commands below create a fake impression of the data by # falsifying the axes. If you really are only interested in data points less # than 500, you should be explicit about creating a data set containing only # such constrained values before plotting them. plot(ecdf(DF$LENGTH), xlab="DEL SIZE", ylab="fraction of DEL", main="LENGTH of DEL", xlim=c(0,500), col = "dark red", axes = FALSE) ticks_y <- c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4) axis(2, at=ticks_y, labels=ticks_y, col.axis="red") ticks_x <- c(0, 100, 200, 400, 500, 600, 700, 800) axis(1, at=ticks_x, labels=ticks_x, col.axis="blue") #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/reprex-body-1.png) # my recommendation DF500 <- subset( DF, LENGTH < 500 ) plot( ecdf( DF500$LENGTH ) , xlab = "DEL SIZE" , ylab = "fraction of DEL" , main = "LENGTH of DEL" , col = "dark red" ) #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/reprex-body-2.png) # alternatively plot( ecdf( DF$LENGTH ) , xlab = "DEL SIZE" , ylab = "fraction of DEL" , main = "LENGTH of DEL" , col = "dark red" , xlim=c( 1, 1e9 ) , log="x" ) #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/reprex-body-3.png) #dev.off() ############################# display in GGPLOT2 : BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000) barfill <- "#4271AE" barlines <- "#1F3552" #pdf("display.ggplot2.ecdf.LENGTH.pdf", width=10, height=6, paper='special') # ggplot's limits behavior is enabling your false representation of the data, but it # warns you of the data removal ggplot(DF, aes(LENGTH)) + stat_ecdf(geom = "point", colour = barlines, fill = barfill) + scale_x_continuous(name = "LENGTH of DEL", breaks = BREAKS, limits=c(0, 500) ) + scale_y_continuous(name = "FRACTION") + ggtitle("ECDF of LENGTH") + theme_bw() + theme(legend.position = "bottom", legend.direction = "horizontal", legend.box = "horizontal", legend.key.size = unit(1, "cm"), axis.title = element_text(size = 12), legend.text = element_text(size = 9), legend.title=element_text(face = "bold", size = 9)) #> Warning: Removed 80 rows containing non-finite values (stat_ecdf). #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/reprex-body-4.png) # my recommendation ggplot(DF500, aes(LENGTH)) + stat_ecdf(geom = "point", colour = barlines, fill = barfill) + scale_x_continuous(name = "LENGTH of DEL", breaks = BREAKS ) + scale_y_continuous(name = "FRACTION") + ggtitle("ECDF of LENGTH") + theme_bw() + theme(legend.position = "bottom", legend.direction = "horizontal", legend.box = "horizontal", legend.key.size = unit(1, "cm"), axis.title = element_text(size = 12), legend.text = element_text(size = 9), legend.title=element_text(face = "bold", size = 9)) #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/reprex-body-5.png) # or for the un-filtered data ggplot(DF, aes(LENGTH)) + stat_ecdf(geom = "point", colour = barlines, fill = barfill) + scale_x_log10( name = "LENGTH of DEL") + scale_y_continuous(name = "FRACTION") + ggtitle("ECDF of LENGTH") + theme_bw() + theme(legend.position = "bottom", legend.direction = "horizontal", legend.box = "horizontal", legend.key.size = unit(1, "cm"), axis.title = element_text(size = 12), legend.text = element_text(size = 9), legend.title=element_text(face = "bold", size = 9)) #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/reprex-body-6.png) #dev.off() #' Created on 2018-07-09 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0). ####### On Sun, 8 Jul 2018, Bogdan Tanasa wrote:> Dear Jeff,? > thank you for your email.? > > Yes, in order to be more descriptive/comprehensive, please find attached to > my email the following files (my apologies ... I am sending these as > attachments, as I do not have a web server running at this moment) :? > > -- the R script (R_script_display_ECDF.R) that reads the file "LENGTH" and > outputs ECDF figure by using the standard R function or ggplot2. > > -- the display of ECDF by using standard R function > ("display.R.ecdf.LENGTH.pdf") > > -- the display of ECDF by using ggplot2 ("display.ggplot2.ecdf.LENGTH.pdf") > > The ECDF over xlim(0,500) looks very different (contrasting plot(ecdf) vs > ggplot2).? Please would you advise why ? what shall I change in my ggplot2 > code ? > > thanks a lot,? > > - bogdan > > ps : the R code is also written below : > > ?library("ggplot2") > > ? > file <- read.delim("LENGTH", sep="\t", header=T, > stringsAsFactors=F)? > > ? > ############################# display with PLOT FUNCTION:? > > ? > pdf("display.R.ecdf.LENGTH.pdf", width=10, height=6, > paper='special')? > > ? > plot(ecdf(file$LENGTH), xlab="DEL SIZE",?? > ? ? ? ? ? ? ? ? ? ? ?ylab="fraction of DEL",? > ? ? ? ? ? ? ? ? ? ? ?main="LENGTH of DEL",?? > ? ? ? ? ? ? ? ? ? ? ?xlim=c(0,500),? > ? ? ? ? ? ? ? ? ? ? ?col = "dark red", axes = FALSE) > > ? > ticks_y <- c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4) > > ? > axis(2, at=ticks_y, labels=ticks_y, col.axis="red") > > ? > ticks_x <- c(0, 100, 200, 400, 500, 600, 700, 800) > > ? > axis(1, at=ticks_x, labels=ticks_x, col.axis="blue") > > ? > dev.off() > > ? > ############################# display in GGPLOT2 :? > > ? > BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, > 400, 500,? > ? ? ? ? ? ?1000, 10000, 100000, 1000000, 10000000, 100000000, > 1000000000) > > ? > barfill <- "#4271AE" > barlines <- "#1F3552" > > ? > pdf("display.ggplot2.ecdf.LENGTH.pdf", width=10, height=6, > paper='special')? > > ? > ggplot(file, aes(LENGTH)) +? > ? ? ? ? ? stat_ecdf(geom = "point", colour = barlines, fill > barfill) + > ? ? ? ? ? scale_x_continuous(name = "LENGTH of DEL", > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?breaks = BREAKS, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?limits=c(0, 500)) + > ? ? ? ? ? scale_y_continuous(name = "FRACTION") + > ? ? ? ? ? ggtitle("ECDF of LENGTH") +? > ? ? ? ? ? theme_bw() + > ? ? ? ? ? theme(legend.position = "bottom", legend.direction > "horizontal", > ? ? ? ? ? ? ? ?legend.box = "horizontal", > ? ? ? ? ? ? ? ?legend.key.size = unit(1, "cm"), > ? ? ? ? ? ? ? ?axis.title = element_text(size = 12), > ? ? ? ? ? ? ? ?legend.text = element_text(size = 9), > ? ? ? ? ? ? ? ?legend.title=element_text(face = "bold", size > 9)) > > ? > dev.off() > > > > > ? > > > On Sat, Jul 7, 2018 at 9:47 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> > wrote: > It is a feature of ggplot that points excluded by limits raise > warnings, while base graphics do not. > > You may find that using coord_cartesian with the xlim=c(0,500) > argument works better with ggplot by showing the consequences of > points out of the limits on lines within the viewport. > > There are other possible problems with your data that your > non-reproducible example does not show, and sending R code in > HTML-formatted email usually corrupts it.. so please follow the > recommendations in the Posting Guide next time you post. > > On July 6, 2018 4:32:41 PM PDT, Bogdan Tanasa <tanasa at gmail.com> > wrote: > >Dear all, > > > >I would appreciate having your advice/suggestions/comments on > the > >following > >: > > > >1 -- starting from a vector that contains LENGTHS (numerically, > the > >values > >are from 1 to 10 000) > > > >2 -- shall I display the ECDF by using the R code and some > "limits" : > > > >BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, > 300, 400, > >500, > >? ? ? ? ?1000, 10000, 100000, 1000000, 10000000, 100000000, > 1000000000) > > > >ggplot(x, aes(LENGTH)) + > >? ? ? ? ? stat_ecdf(geom = "point") + > >? ? ? ? ? scale_x_continuous(name = "LENGTH of DEL", > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ?breaks = BREAKS, > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ?limits=c(0, 500)) > > > >3 -- I am getting the following warning message : "Warning > message: > >Removed > >109 rows containing non-finite values (stat_ecdf)." > > > >The question is : are these 109 values removed from > VISUALIZATION as i > >set > >up the "limits", or are these 109 values removed from > statistical > >CALCULATION? > > > >4 -- in contrast, shall I use the standard R functions > plot(ecdf), > >there is > >no "warning mesage" > > > >plot(ecdf(x$LENGTH), xlab="DEL LENGTH", > >? ? ? ? ? ? ? ? ? ? ?ylab="Fraction of DEL", main="DEL", > xlim=c(0,500), > >? ? ? ? ? ? ? ? ? ? ?col = "dark red") > > > >Thanks a lot ! > > > >-- bogdan > > > >? ? ? ?[[alternative HTML version deleted]] > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. > > > >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k ---------------------------------------------------------------------------
Dear Jeff, thank you for all your time, and very precious help. with best regards. -- bogdan On Mon, Jul 9, 2018 at 1:41 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> Thank you for making the effort... but most attachments get stripped on > the mailing list. Using the reprex package as I suggested and putting the > result into the email is by far the safest approach. Since I received your > email directly, I did get the attachments. Below is my reproducible > example... to serve as an example for how you can get help from everyone on > the list rather than just the few you are responding to. > > My summary comment is that you have to decide whether the LENGTH values > greater than 500 are relevant... and if they are, you REALLY SHOULD create > a data set that is limited in this fashion. Then you won't have to create > "fake" axes, and you won't get ggplot warnings. > > Note: The reprex package allows you to confirm that the example is in fact > reproducible, so technically it is not necessary to include the plot images > in the question. However, reprex used to conveniently support putting the > images on the imgur website, and for some reason it no longer does that, so > just run the example interactively to see the graphs. > > ####### > ############################################################ > ############################################################ > > library("ggplot2") > > # "file" is the name of a very fundamental function in base R. Re-using > # that name for a data value is at best confusing to anyone reading your > # code and at worst will prevent you from using that function. > #file <- read.delim("LENGTH", sep="\t", header=T, stringsAsFactors=F) > > # Instead of giving us a file, keep the data within the example > # DF <- read.delim("LENGTH", sep="\t", header=T, stringsAsFactors=F) > # set.seed( 42 ) > # also shrink the size of the data for the example... we almost > # never need all of it > # dput( DF[ sample( seq.int( nrow( DF ) ), size = 200 ), , drop=FALSE ] ) > DF <- structure(list(LENGTH = c(6813L, 56035L, 123997L, 281L, 851L, 1072L, > 72196L, 21L, 304L, 110L, 198L, 5922L, 283L, 199348L, 109L, > 3317104L, 106L, 37642146L, 82641L, 20L, 125911L, 354L, 11625388L, > 330L, 9811711L, 18L, 35L, 39897L, 27L, 277L, 79L, 2657L, 17L, > 26L, 23L, 248L, 3634L, 21L, 324L, 206L, 328L, 42L, 286L, > 6042409L, > 24L, 36L, 2879L, 18L, 301L, 90684L, 4296636L, 43L, 1222L, 4536L, > 3281L, 324L, 393L, 3754L, 98824541L, 459L, 18L, 1081L, 175L, > 970L, 17L, 219L, 235558L, 1167315L, 25L, 623L, 2517515L, 32L, > 217L, 29L, 17L, 1744L, 18L, 39L, 26L, 77L, 41L, 22L, 311L, > 119015225L, > 146413L, 22L, 19L, 301L, 373L, 2240L, 6439L, 128L, 18L, 257L, > 783L, 5169L, 31608038L, 325L, 1533L, 25L, 69344L, 54L, 10651L, > 31L, 335062L, 1854019L, 7153L, 38605567L, 51L, 23L, 16L, 301L, > 79L, 313L, 18L, 29L, 39L, 22L, 17L, 306L, 67L, 280L, 324L, 158L, > 93L, 2561L, 302L, 134578L, 328L, 9002L, 969051L, 34L, 20L, 309L, > 355L, 28L, 9461327L, 18627013L, 305L, 64L, 18L, 2730L, 28L, 246L, > 911L, 28L, 241483L, 154691L, 58891L, 55L, 456362L, 281L, 276L, > 51L, 26L, 106821L, 313L, 78L, 29L, 400L, 61171382L, 200L, 101L, > 220331L, 128L, 325L, 28L, 22L, 325L, 2330L, 5879L, 24L, 36L, > 23L, 51L, 26L, 32584707L, 1672L, 13939L, 315L, 20L, 580785L, > 42795L, 49193543L, 695L, 48568156L, 55634L, 207L, 318L, 22056L, > 3670420L, 4815387L, 309L, 17L, 3143160L, 431L, 1164L, 33L, 5503L, > 4166L)), .Names = "LENGTH", row.names = c(8283L, 8484L, 2591L, > 7517L, 5808L, 4698L, 6665L, 1219L, 5944L, 6378L, 4140L, 6503L, > 8452L, 2310L, 4180L, 8497L, 8842L, 1062L, 4293L, 5063L, 8168L, > 1253L, 8932L, 8550L, 745L, 4643L, 3523L, 8177L, 4035L, 7545L, > 6657L, 7319L, 3502L, 6181L, 36L, 7513L, 67L, 1873L, 8174L, 5516L, > 3422L, 3928L, 338L, 8773L, 3891L, 8627L, 7997L, 5765L, 8745L, > 5573L, 3003L, 3122L, 3588L, 7064L, 351L, 6739L, 6095L, 1541L, > 2349L, 4628L, 6077L, 8839L, 6830L, 5094L, 7639L, 1704L, 2439L, > 7443L, 6230L, 2162L, 387L, 1262L, 1944L, 4306L, 1773L, 6460L, > 71L, 3371L, 4618L, 15L, 5220L, 1417L, 3222L, 5792L, 6960L, 5056L, > 2096L, 807L, 768L, 2737L, 5983L, 3L, 1870L, 8361L, 8294L, 6577L, > 2984L, 4614L, 6664L, 5545L, 5608L, 1945L, 1939L, 3482L, 8435L, > 8615L, 6621L, 6561L, 4793L, 21L, 5447L, 7484L, 6721L, 4048L, > 4790L, 4804L, 13L, 3179L, 5471L, 7407L, 3187L, 3669L, 5123L, > 5267L, 6427L, 3527L, 8207L, 8593L, 2085L, 6467L, 8065L, 5385L, > 5635L, 8363L, 7587L, 5172L, 7326L, 1015L, 6817L, 5560L, 1324L, > 716L, 4136L, 6945L, 6536L, 7281L, 1516L, 8415L, 2616L, 1328L, > 6406L, 2886L, 6933L, 3511L, 6040L, 6905L, 1672L, 259L, 1208L, > 6051L, 8315L, 4896L, 5351L, 1752L, 4759L, 1597L, 4017L, 2818L, > 1033L, 1654L, 6483L, 3659L, 3678L, 4266L, 3797L, 1212L, 7322L, > 5258L, 7052L, 6826L, 8147L, 7655L, 2813L, 2300L, 6584L, 6629L, > 8140L, 7034L, 1183L, 2551L, 1726L, 6950L, 1143L, 1144L, 641L, > 471L, 4712L, 995L, 6582L, 6476L), class = "data.frame") > > > ############################# display with PLOT FUNCTION: > > > # saving files should be avoided in reproducible examples... especially > files > # that cannot be transmitted through the R-help mailing list such as pdf > files > #pdf("display.R.ecdf.LENGTH.pdf", width=10, height=6, paper='special') > > # Your original plot commands below create a fake impression of the data by > # falsifying the axes. If you really are only interested in data points > less > # than 500, you should be explicit about creating a data set containing > only > # such constrained values before plotting them. > plot(ecdf(DF$LENGTH), xlab="DEL SIZE", > ylab="fraction of DEL", > main="LENGTH of DEL", > xlim=c(0,500), > col = "dark red", axes = FALSE) > ticks_y <- c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4) > axis(2, at=ticks_y, labels=ticks_y, col.axis="red") > ticks_x <- c(0, 100, 200, 400, 500, 600, 700, 800) > axis(1, at=ticks_x, labels=ticks_x, col.axis="blue") > > #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/rep > rex-body-1.png) > > # my recommendation > DF500 <- subset( DF, LENGTH < 500 ) > plot( ecdf( DF500$LENGTH ) > , xlab = "DEL SIZE" > , ylab = "fraction of DEL" > , main = "LENGTH of DEL" > , col = "dark red" > ) > > #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/rep > rex-body-2.png) > > # alternatively > plot( ecdf( DF$LENGTH ) > , xlab = "DEL SIZE" > , ylab = "fraction of DEL" > , main = "LENGTH of DEL" > , col = "dark red" > , xlim=c( 1, 1e9 ) > , log="x" > ) > > #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/rep > rex-body-3.png) > > > > #dev.off() > > ############################# display in GGPLOT2 : > > BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, > 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000) > > barfill <- "#4271AE" > barlines <- "#1F3552" > > #pdf("display.ggplot2.ecdf.LENGTH.pdf", width=10, height=6, > paper='special') > > # ggplot's limits behavior is enabling your false representation of the > data, but it > # warns you of the data removal > ggplot(DF, aes(LENGTH)) + > stat_ecdf(geom = "point", colour = barlines, fill = barfill) + > scale_x_continuous(name = "LENGTH of DEL", > breaks = BREAKS, > limits=c(0, 500) > ) + > scale_y_continuous(name = "FRACTION") + > ggtitle("ECDF of LENGTH") + > theme_bw() + > theme(legend.position = "bottom", legend.direction > "horizontal", > legend.box = "horizontal", > legend.key.size = unit(1, "cm"), > axis.title = element_text(size = 12), > legend.text = element_text(size = 9), > legend.title=element_text(face = "bold", size = 9)) > #> Warning: Removed 80 rows containing non-finite values (stat_ecdf). > > #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/rep > rex-body-4.png) > > > # my recommendation > ggplot(DF500, aes(LENGTH)) + > stat_ecdf(geom = "point", colour = barlines, fill = barfill) + > scale_x_continuous(name = "LENGTH of DEL", > breaks = BREAKS ) + > scale_y_continuous(name = "FRACTION") + > ggtitle("ECDF of LENGTH") + > theme_bw() + > theme(legend.position = "bottom", legend.direction = "horizontal", > legend.box = "horizontal", > legend.key.size = unit(1, "cm"), > axis.title = element_text(size = 12), > legend.text = element_text(size = 9), > legend.title=element_text(face = "bold", size = 9)) > > #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/rep > rex-body-5.png) > > # or for the un-filtered data > ggplot(DF, aes(LENGTH)) + > stat_ecdf(geom = "point", colour = barlines, fill = barfill) + > scale_x_log10( name = "LENGTH of DEL") + > scale_y_continuous(name = "FRACTION") + > ggtitle("ECDF of LENGTH") + > theme_bw() + > theme(legend.position = "bottom", legend.direction = "horizontal", > legend.box = "horizontal", > legend.key.size = unit(1, "cm"), > axis.title = element_text(size = 12), > legend.text = element_text(size = 9), > legend.title=element_text(face = "bold", size = 9)) > > #' ![](file1f4143e5e164_reprex_files/figure-markdown_strict/rep > rex-body-6.png) > > > #dev.off() > > #' Created on 2018-07-09 by the [reprex package](http://reprex.tidyver > se.org) (v0.2.0). > ####### > > > On Sun, 8 Jul 2018, Bogdan Tanasa wrote: > > Dear Jeff, >> thank you for your email. >> >> Yes, in order to be more descriptive/comprehensive, please find attached >> to >> my email the following files (my apologies ... I am sending these as >> attachments, as I do not have a web server running at this moment) : >> >> -- the R script (R_script_display_ECDF.R) that reads the file "LENGTH" and >> outputs ECDF figure by using the standard R function or ggplot2. >> >> -- the display of ECDF by using standard R function >> ("display.R.ecdf.LENGTH.pdf") >> >> -- the display of ECDF by using ggplot2 ("display.ggplot2.ecdf.LENGTH. >> pdf") >> >> The ECDF over xlim(0,500) looks very different (contrasting plot(ecdf) vs >> ggplot2). Please would you advise why ? what shall I change in my ggplot2 >> code ? >> >> thanks a lot, >> >> - bogdan >> >> ps : the R code is also written below : >> >> library("ggplot2") >> >> >> file <- read.delim("LENGTH", sep="\t", header=T, >> stringsAsFactors=F) >> >> >> ############################# display with PLOT FUNCTION: >> >> >> pdf("display.R.ecdf.LENGTH.pdf", width=10, height=6, >> paper='special') >> >> >> plot(ecdf(file$LENGTH), xlab="DEL SIZE", >> ylab="fraction of DEL", >> main="LENGTH of DEL", >> xlim=c(0,500), >> col = "dark red", axes = FALSE) >> >> >> ticks_y <- c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4) >> >> >> axis(2, at=ticks_y, labels=ticks_y, col.axis="red") >> >> >> ticks_x <- c(0, 100, 200, 400, 500, 600, 700, 800) >> >> >> axis(1, at=ticks_x, labels=ticks_x, col.axis="blue") >> >> >> dev.off() >> >> >> ############################# display in GGPLOT2 : >> >> >> BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, >> 400, 500, >> 1000, 10000, 100000, 1000000, 10000000, 100000000, >> 1000000000) >> >> >> barfill <- "#4271AE" >> barlines <- "#1F3552" >> >> >> pdf("display.ggplot2.ecdf.LENGTH.pdf", width=10, height=6, >> paper='special') >> >> >> ggplot(file, aes(LENGTH)) + >> stat_ecdf(geom = "point", colour = barlines, fill >> barfill) + >> scale_x_continuous(name = "LENGTH of DEL", >> breaks = BREAKS, >> limits=c(0, 500)) + >> scale_y_continuous(name = "FRACTION") + >> ggtitle("ECDF of LENGTH") + >> theme_bw() + >> theme(legend.position = "bottom", legend.direction >> "horizontal", >> legend.box = "horizontal", >> legend.key.size = unit(1, "cm"), >> axis.title = element_text(size = 12), >> legend.text = element_text(size = 9), >> legend.title=element_text(face = "bold", size >> 9)) >> >> >> dev.off() >> >> >> >> >> >> >> >> On Sat, Jul 7, 2018 at 9:47 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> >> wrote: >> It is a feature of ggplot that points excluded by limits raise >> warnings, while base graphics do not. >> >> You may find that using coord_cartesian with the xlim=c(0,500) >> argument works better with ggplot by showing the consequences of >> points out of the limits on lines within the viewport. >> >> There are other possible problems with your data that your >> non-reproducible example does not show, and sending R code in >> HTML-formatted email usually corrupts it.. so please follow the >> recommendations in the Posting Guide next time you post. >> >> On July 6, 2018 4:32:41 PM PDT, Bogdan Tanasa <tanasa at gmail.com> >> wrote: >> >Dear all, >> > >> >I would appreciate having your advice/suggestions/comments on >> the >> >following >> >: >> > >> >1 -- starting from a vector that contains LENGTHS (numerically, >> the >> >values >> >are from 1 to 10 000) >> > >> >2 -- shall I display the ECDF by using the R code and some >> "limits" : >> > >> >BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, >> 300, 400, >> >500, >> > 1000, 10000, 100000, 1000000, 10000000, 100000000, >> 1000000000) >> > >> >ggplot(x, aes(LENGTH)) + >> > stat_ecdf(geom = "point") + >> > scale_x_continuous(name = "LENGTH of DEL", >> > breaks = BREAKS, >> > limits=c(0, 500)) >> > >> >3 -- I am getting the following warning message : "Warning >> message: >> >Removed >> >109 rows containing non-finite values (stat_ecdf)." >> > >> >The question is : are these 109 values removed from >> VISUALIZATION as i >> >set >> >up the "limits", or are these 109 values removed from >> statistical >> >CALCULATION? >> > >> >4 -- in contrast, shall I use the standard R functions >> plot(ecdf), >> >there is >> >no "warning mesage" >> > >> >plot(ecdf(x$LENGTH), xlab="DEL LENGTH", >> > ylab="Fraction of DEL", main="DEL", >> xlim=c(0,500), >> > col = "dark red") >> > >> >Thanks a lot ! >> > >> >-- bogdan >> > >> > [[alternative HTML version deleted]] >> > >> >______________________________________________ >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> >http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Sent from my phone. Please excuse my brevity. >> >> >> >> >> > ------------------------------------------------------------ > --------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ------------------------------------------------------------ > ---------------[[alternative HTML version deleted]]