imo, with such simple data, a plot is mere chartjunk. A simple table(the
distribution) would suffice and be more informative:
> table(bug) ## bug is a vector. No data frame is needed
0 1 2 3 4 5 7 ## bug count
162 40 9 7 2 1 1 ## nmbr of cases with the given count
You or others may disagree, of course.
Bert Gunter
On Thu, Feb 17, 2022 at 11:56 AM Neha gupta <neha.bologna90 at gmail.com>
wrote:>
> Ebert and Rui, thank you for providing the tips (in fact, for providing the
> answer I needed).
>
> Yes, you are right that boxplot of all zero values will not make sense.
> Maybe histogram will work.
>
> I am providing a few details of my data here and the context of the
> question I asked.
>
> My data is about bugs/defects in different classes of a large software
> system. I have to predict which class will contain bugs and which will be
> free of bugs (bug=0). I trained ML models and predict but my advisor asked
> me to provide first the data distribution about bugs e.g details of how
> many classes with bugs (bug > 0) and how many are free of bugs (bug=0).
>
> That is why I need to provide the data distribution of both types of values
> (i.e. bug=0 and bug >0)
>
> Thank you again.
>
> On Thu, Feb 17, 2022 at 8:28 PM Rui Barradas <ruipbarradas at
sapo.pt> wrote:
>
> > Hello,
> >
> > In your original post you read the same file "synapse.arff"
twice,
> > apparently to filter each of them by its own criterion. You don't
need
> > to do that, read once and filter that one by different criteria.
> >
> > As for the data as posted, I have read it in with the following code:
> >
> >
> > x <- "
> > 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
0
> > 4 1 0
> > 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
0
> > 0 0 0
> > 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
7
> > 0 0 1
> > 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
0
> > 1 0 0
> > 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
0
> > 0 0 1
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
> > "
> > bug <- scan(text = x)
> > data <- data.frame(bug)
> >
> >
> > This is not the right way to post data, the posting guide asks to post
> > the output of
> >
> >
> > dput(data)
> > structure(list(bug = c(0, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0,
> > 0, 0, 4, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 0, 0, 0, 0,
> > 3, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
> > 0, 0, 1, 1, 2, 1, 0, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0,
> > 1, 0, 0, 1, 0, 0, 1, 0, 0, 5, 0, 0, 0, 0, 0, 0, 7, 0, 0, 1, 0,
> > 1, 1, 0, 2, 0, 3, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
> > 0, 1, 0, 3, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 3, 0, 0, 1, 0, 1, 3, 0, 0, 0, 0,
> > 0, 0, 0, 0, 1, 0, 4, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0)),
> > class = "data.frame", row.names = c(NA, -222L))
> >
> >
> >
> > This can be copied into an R session and the data set recreated with
> >
> > data <- structure(etc)
> >
> >
> > Now the boxplots.
> >
> > (Why would you want to plot a vector of all zeros, btw?)
> >
> >
> >
> > library(dplyr)
> >
> > boxplot(filter(data, bug == 0)) # nonsense
> > boxplot(filter(data, bug > 0), range = 0)
> >
> > # Another way
> > data %>%
> > filter(bug > 0) %>%
> > boxplot(range = 0)
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > ?s 19:03 de 17/02/2022, Neha gupta escreveu:
> > > That is all the code I have. How can I provide a reproducible
code ?
> > >
> > > How can I save this result?
> > >
> > > On Thu, Feb 17, 2022 at 8:00 PM Ebert,Timothy Aaron <tebert at
ufl.edu>
> > wrote:
> > >
> > >> You pipe the filter but do not save the result. A
reproducible example
> > >> might help.
> > >> Tim
> > >>
> > >> -----Original Message-----
> > >> From: R-help <r-help-bounces at r-project.org> On
Behalf Of Neha gupta
> > >> Sent: Thursday, February 17, 2022 1:55 PM
> > >> To: r-help mailing list <r-help at r-project.org>
> > >> Subject: [R] Problem with data distribution
> > >>
> > >> [External Email]
> > >>
> > >> Hello everyone
> > >>
> > >> I have a dataset with output variable "bug" having
the following values
> > >> (at the bottom of this email). My advisor asked me to provide
data
> > >> distribution of bugs with 0 values and bugs with more than 0
values.
> > >>
> > >> data = readARFF("synapse.arff")
> > >> data2 = readARFF("synapse.arff")
> > >> data$bug
> > >> library(tidyverse)
> > >> data %>%
> > >> filter(bug == 0)
> > >> data2 %>%
> > >> filter(bug >= 1)
> > >> boxplot(data2$bug, data$bug, range=0)
> > >>
> > >> But both the graphs are exactly the same, how is it possible?
Where I am
> > >> doing wrong?
> > >>
> > >>
> > >> data$bug
> > >> [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0
0 0 1 0 0
> > 0 0 0
> > >> 0 4 1 0
> > >> [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0
0 0 0 1 1
> > 1 0 0
> > >> 0 0 0 0
> > >> [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0
0 5 0 0 0
> > 0 0 0
> > >> 7 0 0 1
> > >> [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1
1 0 0 0 0
> > 0 0
> > >> 0 1 0 0
> > >> [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0
0 1 0 4 1
> > 1 0
> > >> 0 0 0 1
> > >> [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
> > >>
> > >> [[alternative HTML version deleted]]
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > >>
> >
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&e>
> >> PLEASE do read the posting guide
> > >>
> >
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&e>
> >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.