Hello everyone
I have a dataset with output variable "bug" having the following
values (at
the bottom of this email). My advisor asked me to provide data distribution
of bugs with 0 values and bugs with more than 0 values.
data = readARFF("synapse.arff")
data2 = readARFF("synapse.arff")
data$bug
library(tidyverse)
data %>%
filter(bug == 0)
data2 %>%
filter(bug >= 1)
boxplot(data2$bug, data$bug, range=0)
But both the graphs are exactly the same, how is it possible? Where I am
doing wrong?
data$bug
[1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
0 4 1 0
[40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
0 0 0 0
[79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
7 0 0 1
[118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
0 1 0 0
[157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
0 0 0 1
[196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
[[alternative HTML version deleted]]
You pipe the filter but do not save the result. A reproducible example might
help.
Tim
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Neha gupta
Sent: Thursday, February 17, 2022 1:55 PM
To: r-help mailing list <r-help at r-project.org>
Subject: [R] Problem with data distribution
[External Email]
Hello everyone
I have a dataset with output variable "bug" having the following
values (at the bottom of this email). My advisor asked me to provide data
distribution of bugs with 0 values and bugs with more than 0 values.
data = readARFF("synapse.arff")
data2 = readARFF("synapse.arff")
data$bug
library(tidyverse)
data %>%
filter(bug == 0)
data2 %>%
filter(bug >= 1)
boxplot(data2$bug, data$bug, range=0)
But both the graphs are exactly the same, how is it possible? Where I am doing
wrong?
data$bug
[1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
0 4 1 0
[40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
0 0 0 0
[79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
7 0 0 1
[118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
0 1 0 0
[157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
0 0 0 1
[196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&eand
provide commented, minimal, self-contained, reproducible code.
data <- data %>% filter(bug==0) is one option, but you need to save the
output somewhere.
Can you tell us more about the expected distribution of bug==0? More than a
count of zero bugs.... number of zeros between non-zeros, or something else?
You could provide the data but rename variables and treatments. Alternatively
you could make fake data. It doesn't have to have the same distribution as
the real data.
If bugs is the only variable you have then I could recover the data from what
you printed (though it will take some effort to remove [#]). For our purposes
this would also work:
sample(0:5,50,replace=TRUE) #draws 50 values with replacement from 0 through 5
inclusive
sample(c(0,0,0,0,1,1,1,2,3,4,5),50,replace=TRUE) #draws 50 samples with
replacement from a list of values
abs( round( rnorm(50, mean=2, sd=2),0)) #generates a random number, rounds it to
integer, and takes the absolute value.
To make it fully reproducible in this approach one needs to set the random seed.
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Neha gupta
Sent: Thursday, February 17, 2022 1:55 PM
To: r-help mailing list <r-help at r-project.org>
Subject: [R] Problem with data distribution
[External Email]
Hello everyone
I have a dataset with output variable "bug" having the following
values (at the bottom of this email). My advisor asked me to provide data
distribution of bugs with 0 values and bugs with more than 0 values.
data = readARFF("synapse.arff")
data2 = readARFF("synapse.arff")
data$bug
library(tidyverse)
data %>%
filter(bug == 0)
data2 %>%
filter(bug >= 1)
boxplot(data2$bug, data$bug, range=0)
But both the graphs are exactly the same, how is it possible? Where I am doing
wrong?
data$bug
[1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
0 4 1 0
[40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
0 0 0 0
[79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
7 0 0 1
[118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
0 1 0 0
[157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
0 0 0 1
[196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&eand
provide commented, minimal, self-contained, reproducible code.