Hello everyone I have a dataset with output variable "bug" having the following values (at the bottom of this email). My advisor asked me to provide data distribution of bugs with 0 values and bugs with more than 0 values. data = readARFF("synapse.arff") data2 = readARFF("synapse.arff") data$bug library(tidyverse) data %>% filter(bug == 0) data2 %>% filter(bug >= 1) boxplot(data2$bug, data$bug, range=0) But both the graphs are exactly the same, how is it possible? Where I am doing wrong? data$bug [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 0 4 1 0 [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 7 0 0 1 [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 0 1 0 0 [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 0 0 0 1 [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 [[alternative HTML version deleted]]
You pipe the filter but do not save the result. A reproducible example might help. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Neha gupta Sent: Thursday, February 17, 2022 1:55 PM To: r-help mailing list <r-help at r-project.org> Subject: [R] Problem with data distribution [External Email] Hello everyone I have a dataset with output variable "bug" having the following values (at the bottom of this email). My advisor asked me to provide data distribution of bugs with 0 values and bugs with more than 0 values. data = readARFF("synapse.arff") data2 = readARFF("synapse.arff") data$bug library(tidyverse) data %>% filter(bug == 0) data2 %>% filter(bug >= 1) boxplot(data2$bug, data$bug, range=0) But both the graphs are exactly the same, how is it possible? Where I am doing wrong? data$bug [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 0 4 1 0 [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 7 0 0 1 [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 0 1 0 0 [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 0 0 0 1 [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&ePLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&eand provide commented, minimal, self-contained, reproducible code.
data <- data %>% filter(bug==0) is one option, but you need to save the output somewhere. Can you tell us more about the expected distribution of bug==0? More than a count of zero bugs.... number of zeros between non-zeros, or something else? You could provide the data but rename variables and treatments. Alternatively you could make fake data. It doesn't have to have the same distribution as the real data. If bugs is the only variable you have then I could recover the data from what you printed (though it will take some effort to remove [#]). For our purposes this would also work: sample(0:5,50,replace=TRUE) #draws 50 values with replacement from 0 through 5 inclusive sample(c(0,0,0,0,1,1,1,2,3,4,5),50,replace=TRUE) #draws 50 samples with replacement from a list of values abs( round( rnorm(50, mean=2, sd=2),0)) #generates a random number, rounds it to integer, and takes the absolute value. To make it fully reproducible in this approach one needs to set the random seed. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Neha gupta Sent: Thursday, February 17, 2022 1:55 PM To: r-help mailing list <r-help at r-project.org> Subject: [R] Problem with data distribution [External Email] Hello everyone I have a dataset with output variable "bug" having the following values (at the bottom of this email). My advisor asked me to provide data distribution of bugs with 0 values and bugs with more than 0 values. data = readARFF("synapse.arff") data2 = readARFF("synapse.arff") data$bug library(tidyverse) data %>% filter(bug == 0) data2 %>% filter(bug >= 1) boxplot(data2$bug, data$bug, range=0) But both the graphs are exactly the same, how is it possible? Where I am doing wrong? data$bug [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 0 4 1 0 [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 7 0 0 1 [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 0 1 0 0 [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 0 0 0 1 [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&ePLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&eand provide commented, minimal, self-contained, reproducible code.