Dear Nega gupta, On 2022-02-17 1:54 p.m., Neha gupta wrote:> Hello everyone > > I have a dataset with output variable "bug" having the following values (at > the bottom of this email). My advisor asked me to provide data distribution > of bugs with 0 values and bugs with more than 0 values. > > data = readARFF("synapse.arff") > data2 = readARFF("synapse.arff") > data$bug > library(tidyverse) > data %>% > filter(bug == 0) > data2 %>% > filter(bug >= 1) > boxplot(data2$bug, data$bug, range=0) > > But both the graphs are exactly the same, how is it possible? Where I am > doing wrong?As it turns out, you're doing several things wrong. First, you're not using pipes and filter() correctly. That is, you don't do anything with the filtered versions of the data sets. You're apparently under the incorrect impression that filtering modifies the original data set. Second, you're greatly complicating a simple problem. You don't need to read the data twice and keep two versions of the data set. As well, processing the data with pipes and filter() is entirely unnecessary. The following code works: with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0)) Third, and most fundamentally, the parallel boxplots you're apparently trying to construct don't really make sense. The first "boxplot" is just a horizontal line at 0 and so conveys no information. Why not just plot the nonzero values if that's what you're interested in? Fourth, you didn't share your data in a convenient form. I was able to reconstruct them via bug <- scan() 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 0 4 1 0 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 7 0 0 1 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 data <- data.frame(bug) Finally, it's better not to post to the list in plain-text email, rather than html (as the posting guide suggests). I hope this helps, John> > > data$bug > [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 > 0 4 1 0 > [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 > 0 0 0 0 > [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 > 7 0 0 1 > [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 > 0 1 0 0 > [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 > 0 0 0 1 > [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/
Dear Nega gupta, In the last point, I meant to say, "Finally, it's better to post to the list in plain-text email, rather than html (as the posting guide suggests)." (I accidentally inserted a "not" in this sentence.) Sorry, John On 2022-02-17 2:21 p.m., John Fox wrote:> Dear Nega gupta, > > On 2022-02-17 1:54 p.m., Neha gupta wrote: >> Hello everyone >> >> I have a dataset with output variable "bug" having the following >> values (at >> the bottom of this email). My advisor asked me to provide data >> distribution >> of bugs with 0 values and bugs with more than 0 values. >> >> data = readARFF("synapse.arff") >> data2 = readARFF("synapse.arff") >> data$bug >> library(tidyverse) >> data %>% >> ?? filter(bug == 0) >> data2 %>% >> ?? filter(bug >= 1) >> boxplot(data2$bug, data$bug, range=0) >> >> But both the graphs are exactly the same, how is it possible? Where I am >> doing wrong? > > As it turns out, you're doing several things wrong. > > First, you're not using pipes and filter() correctly. That is, you don't > do anything with the filtered versions of the data sets. You're > apparently under the incorrect impression that filtering modifies the > original data set. > > Second, you're greatly complicating a simple problem. You don't need to > read the data twice and keep two versions of the data set. As well, > processing the data with pipes and filter() is entirely unnecessary. The > following code works: > > ?? with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0)) > > Third, and most fundamentally, the parallel boxplots you're apparently > trying to construct don't really make sense. The first "boxplot" is just > a horizontal line at 0 and so conveys no information. Why not just plot > the nonzero values if that's what you're interested in? > > Fourth, you didn't share your data in a convenient form. I was able to > reconstruct them via > > ? bug <- scan() > ? 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 > ? 0 4 1 0 > ? 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 > ? 0 0 0 0 > ? 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 > ? 7 0 0 1 > ? 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 > ? 0 1 0 0 > ? 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 > ? 0 0 0 1 > ? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 > > ? data <- data.frame(bug) > > Finally, it's better not to post to the list in plain-text email, rather > than html (as the posting guide suggests). > > I hope this helps, > ?John > >> >> >> data$bug >> ?? [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 >> 0 0 0 >> 0 4 1 0 >> ? [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 >> 1 0 0 >> 0 0 0 0 >> ? [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 >> 0 0 0 >> 7 0 0 1 >> [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 >> 0 0 0 >> 0 1 0 0 >> [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 >> 1 1 0 >> 0 0 0 1 >> [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 >> >> ????[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.-- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/
Dear John, thanks a lot for the detailed answer. Yes, I am not an expert in R language and when a problem comes in, I google it or post it on these forums. (I have just a little bit experience of ML in R). On Thu, Feb 17, 2022 at 8:21 PM John Fox <jfox at mcmaster.ca> wrote:> Dear Nega gupta, > > On 2022-02-17 1:54 p.m., Neha gupta wrote: > > Hello everyone > > > > I have a dataset with output variable "bug" having the following values > (at > > the bottom of this email). My advisor asked me to provide data > distribution > > of bugs with 0 values and bugs with more than 0 values. > > > > data = readARFF("synapse.arff") > > data2 = readARFF("synapse.arff") > > data$bug > > library(tidyverse) > > data %>% > > filter(bug == 0) > > data2 %>% > > filter(bug >= 1) > > boxplot(data2$bug, data$bug, range=0) > > > > But both the graphs are exactly the same, how is it possible? Where I am > > doing wrong? > > As it turns out, you're doing several things wrong. > > First, you're not using pipes and filter() correctly. That is, you don't > do anything with the filtered versions of the data sets. You're > apparently under the incorrect impression that filtering modifies the > original data set. > > Second, you're greatly complicating a simple problem. You don't need to > read the data twice and keep two versions of the data set. As well, > processing the data with pipes and filter() is entirely unnecessary. The > following code works: > > with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0)) > > Third, and most fundamentally, the parallel boxplots you're apparently > trying to construct don't really make sense. The first "boxplot" is just > a horizontal line at 0 and so conveys no information. Why not just plot > the nonzero values if that's what you're interested in? > > Fourth, you didn't share your data in a convenient form. I was able to > reconstruct them via > > bug <- scan() > 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 > 0 4 1 0 > 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 > 0 0 0 0 > 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 > 7 0 0 1 > 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 > 0 1 0 0 > 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 > 0 0 0 1 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 > > data <- data.frame(bug) > > Finally, it's better not to post to the list in plain-text email, rather > than html (as the posting guide suggests). > > I hope this helps, > John > > > > > > > data$bug > > [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 > 0 0 > > 0 4 1 0 > > [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 > 0 0 > > 0 0 0 0 > > [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 > 0 0 > > 7 0 0 1 > > [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 > 0 0 > > 0 1 0 0 > > [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 > 1 0 > > 0 0 0 1 > > [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://socialsciences.mcmaster.ca/jfox/ > >[[alternative HTML version deleted]]
Dear Neha gupta, I hope that I'm not overstepping my role when I say that googling solutions to specific problems isn't an inefficient way to learn a programming language, and will probably waste your time in the long run. There are many good introductions to R. Best, John On 2022-02-17 2:27 p.m., Neha gupta wrote:> Dear John, thanks a lot for the detailed answer. > > Yes, I am not an expert in R language and when a problem comes in, I > google it or post it on these forums. (I have just a little bit > experience of ML in R). > > > > On Thu, Feb 17, 2022 at 8:21 PM John Fox <jfox at mcmaster.ca > <mailto:jfox at mcmaster.ca>> wrote: > > Dear Nega gupta, > > On 2022-02-17 1:54 p.m., Neha gupta wrote: > > Hello everyone > > > > I have a dataset with output variable "bug" having the following > values (at > > the bottom of this email). My advisor asked me to provide data > distribution > > of bugs with 0 values and bugs with more than 0 values. > > > > data = readARFF("synapse.arff") > > data2 = readARFF("synapse.arff") > > data$bug > > library(tidyverse) > > data %>% > >? ? filter(bug == 0) > > data2 %>% > >? ? filter(bug >= 1) > > boxplot(data2$bug, data$bug, range=0) > > > > But both the graphs are exactly the same, how is it possible? > Where I am > > doing wrong? > > As it turns out, you're doing several things wrong. > > First, you're not using pipes and filter() correctly. That is, you > don't > do anything with the filtered versions of the data sets. You're > apparently under the incorrect impression that filtering modifies the > original data set. > > Second, you're greatly complicating a simple problem. You don't need to > read the data twice and keep two versions of the data set. As well, > processing the data with pipes and filter() is entirely unnecessary. > The > following code works: > > ? ? with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0)) > > Third, and most fundamentally, the parallel boxplots you're apparently > trying to construct don't really make sense. The first "boxplot" is > just > a horizontal line at 0 and so conveys no information. Why not just plot > the nonzero values if that's what you're interested in? > > Fourth, you didn't share your data in a convenient form. I was able to > reconstruct them via > > ? ?bug <- scan() > ? ?0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 > 0 0 0 > ? ?0 4 1 0 > ? ?0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 > 1 0 0 > ? ?0 0 0 0 > ? ?1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 > 0 0 0 > ? ?7 0 0 1 > ? ?0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 > 0 0 0 > ? ?0 1 0 0 > ? ?0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 > 1 1 0 > ? ?0 0 0 1 > ? ?0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 > > ? ?data <- data.frame(bug) > > Finally, it's better not to post to the list in plain-text email, > rather > than html (as the posting guide suggests). > > I hope this helps, > ? John > > > > > > > data$bug > >? ? [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 > 1 0 0 0 0 0 > > 0 4 1 0 > >? ?[40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 > 0 1 1 1 0 0 > > 0 0 0 0 > >? ?[79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 > 0 0 0 0 0 0 > > 7 0 0 1 > > [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 > 0 0 0 0 0 > > 0 1 0 0 > > [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 > 0 4 1 1 0 > > 0 0 0 1 > > [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 > > > >? ? ? ?[[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list > -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > -- > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://socialsciences.mcmaster.ca/jfox/ > <https://socialsciences.mcmaster.ca/jfox/> >-- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/