Ritwik Mohapatra
2020-Jul-24 13:29 UTC
[R] How to create a readable plot in R with 10000+ values in a dataframe
Hi All, These are the two codes i have used so far:- ggplot(df3_machine_region,aes(Region,Machine.Name)) + geom_count() !![2nd Plot|690x375](upload://gTyYUXe6lPJXCdyvqRBtUZ8zsyL.png) [1st Plot|690x375](upload://bb0ux9WheqM4ViyYf3Gki6TKtlG.png) ggplot(df3_machine_region,aes(Region,Machine.Name)) + geom_jitter(aes(colour=Region)) I have to present the plot to my stakeholders,so thats why its required in a readable and legible way. There would be approximately 10k+ values(max) for machine and region combination. I have attached the output plots for your reference.Please find below a snapshot of data for your reference. |Machine.Name|Region| |0460-EPBS1.sga-res.com|Europe| |04821-EABS1.sga-res.com|Europe| |10429-EDABS1.sga-res.com|Europe| |1042619-ESWEBS1.sga-res.com|Europe| |ABE-L-98769.europe.shell.com|Americas| |AB-L-98769.europe.shell.com|APAC| |AB-L-98769.europe.shell.com|Europe| |ABE-L-98769.europe.shell.com (2)|Americas| |ABE-L-98769.europe.shell.com (2)|Europe| |ABE-L-98840.europe.shell.com|Americas| |AB-L-98840.europe.shell.com|APAC| |ABE-L-98840.europe.shell.com|Europe| |AB-L-98854.europe.shell.com|Americas| |ABE-L-98854.europe.shell.com|Europe| |ABE-L-98862.europe.shell.com|Americas| Regards, Ritwik On Fri, Jul 24, 2020 at 6:05 PM Martin Maechler <maechler at stat.math.ethz.ch> wrote:> >>>>> Ritwik Mohapatra > >>>>> on Thu, 23 Jul 2020 23:41:57 +0530 writes: > > > How to create a readable and legible plot in R with 10k+ values.I > have a > > dataframe with 17298 records.There are two columns:Machine > Name(Character) > > and Region(Character).So i want to create a readable plot with > region in x > > axis and machine name in y axis.How do i do that using ggplot or any > other > > way.Please help. > > Good answers to this question will depend very much on how many > 'Machine' and 'Region' levels there are. > > (and this is a case where in my opinion it'd be *MUCH* more > useful to have 'factor' instead of 'character'.. if only just > so > str(<data>) > or summary(<data>) > > would give useful/relevant information. > > -- > One possibility for a somewhat cute plot is a "good ole" > sunflower plot (base graphics, but the idea must be easily > transferable to grid-based graphics such as ggplot2): > > help(sunflowerplot) > > > Martin Maechler > ETH Zurich >-------------- next part -------------- A non-text attachment was scrubbed... Name: 1st Plot.png Type: image/png Size: 29513 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200724/24142a98/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: 2nd Plot.png Type: image/png Size: 35282 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200724/24142a98/attachment-0001.png>
Jim Lemon
2020-Jul-29 09:31 UTC
[R] How to create a readable plot in R with 10000+ values in a dataframe
Hi Ritwik, I haven't seen any further answers to your request, so I'll make a suggestion. I don't think there is any sensible way to illustrate that many data points on a single plot. I would try to segment the data by machine type or similar and plot a number of plots. Jim On Fri, Jul 24, 2020 at 11:34 PM Ritwik Mohapatra <ritm84 at gmail.com> wrote:> > Hi All, > > These are the two codes i have used so far:- > ggplot(df3_machine_region,aes(Region,Machine.Name)) + > geom_count() > !![2nd Plot|690x375](upload://gTyYUXe6lPJXCdyvqRBtUZ8zsyL.png) [1st > Plot|690x375](upload://bb0ux9WheqM4ViyYf3Gki6TKtlG.png) > ggplot(df3_machine_region,aes(Region,Machine.Name)) + > geom_jitter(aes(colour=Region)) > > I have to present the plot to my stakeholders,so thats why its required in > a readable and legible way. > > There would be approximately 10k+ values(max) for machine and region > combination. > > I have attached the output plots for your reference.Please find below a > snapshot of data for your reference. > > |Machine.Name|Region| > |0460-EPBS1.sga-res.com|Europe| > |04821-EABS1.sga-res.com|Europe| > |10429-EDABS1.sga-res.com|Europe| > |1042619-ESWEBS1.sga-res.com|Europe| > |ABE-L-98769.europe.shell.com|Americas| > |AB-L-98769.europe.shell.com|APAC| > |AB-L-98769.europe.shell.com|Europe| > |ABE-L-98769.europe.shell.com (2)|Americas| > |ABE-L-98769.europe.shell.com (2)|Europe| > |ABE-L-98840.europe.shell.com|Americas| > |AB-L-98840.europe.shell.com|APAC| > |ABE-L-98840.europe.shell.com|Europe| > |AB-L-98854.europe.shell.com|Americas| > |ABE-L-98854.europe.shell.com|Europe| > |ABE-L-98862.europe.shell.com|Americas| > > Regards, > Ritwik > > On Fri, Jul 24, 2020 at 6:05 PM Martin Maechler <maechler at stat.math.ethz.ch> > wrote: > > > >>>>> Ritwik Mohapatra > > >>>>> on Thu, 23 Jul 2020 23:41:57 +0530 writes: > > > > > How to create a readable and legible plot in R with 10k+ values.I > > have a > > > dataframe with 17298 records.There are two columns:Machine > > Name(Character) > > > and Region(Character).So i want to create a readable plot with > > region in x > > > axis and machine name in y axis.How do i do that using ggplot or any > > other > > > way.Please help. > > > > Good answers to this question will depend very much on how many > > 'Machine' and 'Region' levels there are. > > > > (and this is a case where in my opinion it'd be *MUCH* more > > useful to have 'factor' instead of 'character'.. if only just > > so > > str(<data>) > > or summary(<data>) > > > > would give useful/relevant information. > > > > -- > > One possibility for a somewhat cute plot is a "good ole" > > sunflower plot (base graphics, but the idea must be easily > > transferable to grid-based graphics such as ggplot2): > > > > help(sunflowerplot) > > > > > > Martin Maechler > > ETH Zurich > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Carlos Ortega
2020-Jul-29 19:04 UTC
[R] How to create a readable plot in R with 10000+ values in a dataframe
Hello Ritwik, There is another possibility. You can count (crosstab) the number of elements for each Region and Machine (with table() function) and represent this table with geom_tile() function. Wit this you will get an equivalent of a heatmap which will give you a good sense of which combination of Region/Machine prevails. Here you can get an example of how to use it: - https://www.r-graph-gallery.com/79-levelplot-with-ggplot2.html And, just in in case you have to represent numeric values (numeric scatter plot) there is an excellent way to graph that with this package, without leaving ggplot ecosystem: https://github.com/LKremer/ggpointdensity Thanks, Carlos Ortega. On Wed, Jul 29, 2020 at 11:31 AM Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Ritwik, > I haven't seen any further answers to your request, so I'll make a > suggestion. I don't think there is any sensible way to illustrate that > many data points on a single plot. I would try to segment the data by > machine type or similar and plot a number of plots. > > Jim > > On Fri, Jul 24, 2020 at 11:34 PM Ritwik Mohapatra <ritm84 at gmail.com> > wrote: > > > > Hi All, > > > > These are the two codes i have used so far:- > > ggplot(df3_machine_region,aes(Region,Machine.Name)) + > > geom_count() > > !![2nd Plot|690x375](upload://gTyYUXe6lPJXCdyvqRBtUZ8zsyL.png) [1st > > Plot|690x375](upload://bb0ux9WheqM4ViyYf3Gki6TKtlG.png) > > ggplot(df3_machine_region,aes(Region,Machine.Name)) + > > geom_jitter(aes(colour=Region)) > > > > I have to present the plot to my stakeholders,so thats why its required > in > > a readable and legible way. > > > > There would be approximately 10k+ values(max) for machine and region > > combination. > > > > I have attached the output plots for your reference.Please find below a > > snapshot of data for your reference. > > > > |Machine.Name|Region| > > |0460-EPBS1.sga-res.com|Europe| > > |04821-EABS1.sga-res.com|Europe| > > |10429-EDABS1.sga-res.com|Europe| > > |1042619-ESWEBS1.sga-res.com|Europe| > > |ABE-L-98769.europe.shell.com|Americas| > > |AB-L-98769.europe.shell.com|APAC| > > |AB-L-98769.europe.shell.com|Europe| > > |ABE-L-98769.europe.shell.com (2)|Americas| > > |ABE-L-98769.europe.shell.com (2)|Europe| > > |ABE-L-98840.europe.shell.com|Americas| > > |AB-L-98840.europe.shell.com|APAC| > > |ABE-L-98840.europe.shell.com|Europe| > > |AB-L-98854.europe.shell.com|Americas| > > |ABE-L-98854.europe.shell.com|Europe| > > |ABE-L-98862.europe.shell.com|Americas| > > > > Regards, > > Ritwik > > > > On Fri, Jul 24, 2020 at 6:05 PM Martin Maechler < > maechler at stat.math.ethz.ch> > > wrote: > > > > > >>>>> Ritwik Mohapatra > > > >>>>> on Thu, 23 Jul 2020 23:41:57 +0530 writes: > > > > > > > How to create a readable and legible plot in R with 10k+ values.I > > > have a > > > > dataframe with 17298 records.There are two columns:Machine > > > Name(Character) > > > > and Region(Character).So i want to create a readable plot with > > > region in x > > > > axis and machine name in y axis.How do i do that using ggplot or > any > > > other > > > > way.Please help. > > > > > > Good answers to this question will depend very much on how many > > > 'Machine' and 'Region' levels there are. > > > > > > (and this is a case where in my opinion it'd be *MUCH* more > > > useful to have 'factor' instead of 'character'.. if only just > > > so > > > str(<data>) > > > or summary(<data>) > > > > > > would give useful/relevant information. > > > > > > -- > > > One possibility for a somewhat cute plot is a "good ole" > > > sunflower plot (base graphics, but the idea must be easily > > > transferable to grid-based graphics such as ggplot2): > > > > > > help(sunflowerplot) > > > > > > > > > Martin Maechler > > > ETH Zurich > > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]