Maybe what you want is to recode your data differently.
One data set has bug versus no bug. What is the probability of having one or
more bugs?
The other data set has bugs only. Given that I have bugs how many will I get?
Tim
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Neha gupta
Sent: Thursday, February 17, 2022 4:54 PM
To: Bert Gunter <bgunter.4567 at gmail.com>
Cc: r-help mailing list <r-help at r-project.org>
Subject: Re: [R] Problem with data distribution
[External Email]
:) :)
On Thu, Feb 17, 2022 at 10:37 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> imo, with such simple data, a plot is mere chartjunk. A simple table(=
> the distribution) would suffice and be more informative:
>
> > table(bug) ## bug is a vector. No data frame is needed
>
> 0 1 2 3 4 5 7 ## bug count
> 162 40 9 7 2 1 1 ## nmbr of cases with the given count
>
> You or others may disagree, of course.
>
> Bert Gunter
>
>
>
> On Thu, Feb 17, 2022 at 11:56 AM Neha gupta <neha.bologna90 at
gmail.com>
> wrote:
> >
> > Ebert and Rui, thank you for providing the tips (in fact, for
> > providing
> the
> > answer I needed).
> >
> > Yes, you are right that boxplot of all zero values will not make
sense.
> > Maybe histogram will work.
> >
> > I am providing a few details of my data here and the context of the
> > question I asked.
> >
> > My data is about bugs/defects in different classes of a large
> > software system. I have to predict which class will contain bugs and
> > which will be free of bugs (bug=0). I trained ML models and predict
> > but my advisor
> asked
> > me to provide first the data distribution about bugs e.g details of
> > how many classes with bugs (bug > 0) and how many are free of bugs
(bug=0).
> >
> > That is why I need to provide the data distribution of both types of
> values
> > (i.e. bug=0 and bug >0)
> >
> > Thank you again.
> >
> > On Thu, Feb 17, 2022 at 8:28 PM Rui Barradas <ruipbarradas at
sapo.pt>
> wrote:
> >
> > > Hello,
> > >
> > > In your original post you read the same file
"synapse.arff" twice,
> > > apparently to filter each of them by its own criterion. You
don't
> > > need to do that, read once and filter that one by different
criteria.
> > >
> > > As for the data as posted, I have read it in with the following
code:
> > >
> > >
> > > x <- "
> > > 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0
> > > 0 0 0
> > > 4 1 0
> > > 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1
> > > 0 0 0
> > > 0 0 0
> > > 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0
> > > 0 0 7
> > > 0 0 1
> > > 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0
> > > 0 0 0
> > > 1 0 0
> > > 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1
> > > 1 0 0
> > > 0 0 1
> > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 "
> > > bug <- scan(text = x)
> > > data <- data.frame(bug)
> > >
> > >
> > > This is not the right way to post data, the posting guide asks to
> > > post the output of
> > >
> > >
> > > dput(data)
> > > structure(list(bug = c(0, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0,
> > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
> > > 4, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 0, 0, 0, 0, 3, 0, 0,
> > > 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1,
> > > 2, 1, 0, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
> > > 0, 1, 0, 0, 5, 0, 0, 0, 0, 0, 0, 7, 0, 0, 1, 0, 1, 1, 0, 2, 0, 3,
> > > 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 3, 2, 1, 1,
> > > 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
> > > 0, 0, 3, 0, 0, 1, 0, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 4, 1, 1,
> > > 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > > 0, 0, 3, 0, 1, 0, 0, 0, 0, 0)), class = "data.frame",
row.names =
> > > c(NA, -222L))
> > >
> > >
> > >
> > > This can be copied into an R session and the data set recreated
> > > with
> > >
> > > data <- structure(etc)
> > >
> > >
> > > Now the boxplots.
> > >
> > > (Why would you want to plot a vector of all zeros, btw?)
> > >
> > >
> > >
> > > library(dplyr)
> > >
> > > boxplot(filter(data, bug == 0)) # nonsense
> > > boxplot(filter(data, bug > 0), range = 0)
> > >
> > > # Another way
> > > data %>%
> > > filter(bug > 0) %>%
> > > boxplot(range = 0)
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > >
> > > ?s 19:03 de 17/02/2022, Neha gupta escreveu:
> > > > That is all the code I have. How can I provide a
reproducible code ?
> > > >
> > > > How can I save this result?
> > > >
> > > > On Thu, Feb 17, 2022 at 8:00 PM Ebert,Timothy Aaron
> > > > <tebert at ufl.edu>
> > > wrote:
> > > >
> > > >> You pipe the filter but do not save the result. A
reproducible
> example
> > > >> might help.
> > > >> Tim
> > > >>
> > > >> -----Original Message-----
> > > >> From: R-help <r-help-bounces at r-project.org> On
Behalf Of Neha
> > > >> gupta
> > > >> Sent: Thursday, February 17, 2022 1:55 PM
> > > >> To: r-help mailing list <r-help at r-project.org>
> > > >> Subject: [R] Problem with data distribution
> > > >>
> > > >> [External Email]
> > > >>
> > > >> Hello everyone
> > > >>
> > > >> I have a dataset with output variable "bug"
having the
> > > >> following
> values
> > > >> (at the bottom of this email). My advisor asked me to
provide
> > > >> data distribution of bugs with 0 values and bugs with
more than 0 values.
> > > >>
> > > >> data = readARFF("synapse.arff")
> > > >> data2 = readARFF("synapse.arff") data$bug
> > > >> library(tidyverse)
> > > >> data %>%
> > > >> filter(bug == 0)
> > > >> data2 %>%
> > > >> filter(bug >= 1)
> > > >> boxplot(data2$bug, data$bug, range=0)
> > > >>
> > > >> But both the graphs are exactly the same, how is it
possible?
> > > >> Where
> I am
> > > >> doing wrong?
> > > >>
> > > >>
> > > >> data$bug
> > > >> [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2
0 0 0
> > > >> 0 1
> 0 0
> > > 0 0 0
> > > >> 0 4 1 0
> > > >> [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0
1 0 0
> > > >> 0 0
> 1 1
> > > 1 0 0
> > > >> 0 0 0 0
> > > >> [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0
1 0 0
> > > >> 5 0
> 0 0
> > > 0 0 0
> > > >> 7 0 0 1
> > > >> [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0
3 2 1 1
> > > >> 0 0
> 0 0
> > > 0 0
> > > >> 0 1 0 0
> > > >> [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0
0 0 0 0
> > > >> 1 0
> 4 1
> > > 1 0
> > > >> 0 0 0 1
> > > >> [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0
0 0
> > > >>
> > > >> [[alternative HTML version deleted]]
> > > >>
> > > >> ______________________________________________
> > > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more,
> > > >> see
> > > >>
> > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
>
man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> Rzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8
> YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&e> >
> >> PLEASE do read the posting guide
> > > >>
> > >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
>
g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> sRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR
> 8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&e> >
> >> and provide commented, minimal, self-contained, reproducible code.
> > > >>
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more,
> > > > see
> > > >
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.c
> > > >
h_mailman_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r>
> > >
9PEhQh2kVeAsRzsn7AkP-g&m=3hWViXJSTXDpoNVYXho6Boeq6QUtotK37L0ChgM
> > > >
CpncRRH1bjKjIUqHjMj8vHCeH&s=53w0MvIpfAklRelSPE5abL_5YG-wyIrrXiFa
> > > > oqbAfLo&e= PLEASE do read the posting guide
> > >
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dprojec
> > >
t.org_posting-2Dguide.html&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PE
> > >
hQh2kVeAsRzsn7AkP-g&m=3hWViXJSTXDpoNVYXho6Boeq6QUtotK37L0ChgMCpncR
> > >
RH1bjKjIUqHjMj8vHCeH&s=MBVLtPJJyplOC4i8e5ZupFYAXaiICGuK6qsIzxnCEP4
> > > &e> > > > and provide commented, minimal,
self-contained, reproducible code.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_ma
> >
ilman_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2k
> >
VeAsRzsn7AkP-g&m=3hWViXJSTXDpoNVYXho6Boeq6QUtotK37L0ChgMCpncRRH1bjKj
> >
IUqHjMj8vHCeH&s=53w0MvIpfAklRelSPE5abL_5YG-wyIrrXiFaoqbAfLo&e> >
PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
>
g_posting-2Dguide.html&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> sRzsn7AkP-g&m=3hWViXJSTXDpoNVYXho6Boeq6QUtotK37L0ChgMCpncRRH1bjKjIUqHj
> Mj8vHCeH&s=MBVLtPJJyplOC4i8e5ZupFYAXaiICGuK6qsIzxnCEP4&e> >
and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=3hWViXJSTXDpoNVYXho6Boeq6QUtotK37L0ChgMCpncRRH1bjKjIUqHjMj8vHCeH&s=53w0MvIpfAklRelSPE5abL_5YG-wyIrrXiFaoqbAfLo&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=3hWViXJSTXDpoNVYXho6Boeq6QUtotK37L0ChgMCpncRRH1bjKjIUqHjMj8vHCeH&s=MBVLtPJJyplOC4i8e5ZupFYAXaiICGuK6qsIzxnCEP4&eand
provide commented, minimal, self-contained, reproducible code.