Hello, I have a dataframe (t1) with many columns, but the one I care about it this:> unique(t1$sex_chromosome_aneuploidy_f22019_0_0)[1] NA "Yes" it has these two values. I would like to remove from my dataframe t1 all rows which have "Yes" in t1$sex_chromosome_aneuploidy_f22019_0_0 I tried selecting those rows with "Yes" via: t11=t1[t1$sex_chromosome_aneuploidy_f22019_0_0=="Yes",] but I got t11 which has the exact same number of rows as t1. If I do:> table(t1$sex_chromosome_aneuploidy_f22019_0_0)Yes 620 So there is for sure 620 rows which have "Yes". How to remove those from my t1 data frame? Thanks Ana
Hello, You have to use is.na to get the NA values. t1 <- data.frame(sex_chromosome_aneuploidy_f22019_0_0 = c(NA, "Yes"), other = 1:2) i <- t1$sex_chromosome_aneuploidy_f22019_0_0 == "Yes" & !is.na(t1$sex_chromosome_aneuploidy_f22019_0_0) i t1[i, ] Hope this helps, Rui Barradas ?s 19:58 de 03/10/19, Ana Marija escreveu:> Hello, > > I have a dataframe (t1) with many columns, but the one I care about it this: >> unique(t1$sex_chromosome_aneuploidy_f22019_0_0) > [1] NA "Yes" > > it has these two values. > > I would like to remove from my dataframe t1 all rows which have "Yes" > in t1$sex_chromosome_aneuploidy_f22019_0_0 > > I tried selecting those rows with "Yes" via: > > t11=t1[t1$sex_chromosome_aneuploidy_f22019_0_0=="Yes",] > > but I got t11 which has the exact same number of rows as t1. > > If I do: >> table(t1$sex_chromosome_aneuploidy_f22019_0_0) > > Yes > 620 > > So there is for sure 620 rows which have "Yes". How to remove those > from my t1 data frame? > > Thanks > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hello, I expected the code you posted to work just as you presumed it would, but without a reproducible example--I can only speculate as to why it didn't. In the t1 dataframe, if indeed you only want to remove rows of the t1$sex_chromosome_aneuploidy_f22019_0_0 column which are undefined, you could try the following:> t11 <- t1[ !is.na(t1$sex_chromosome_aneuploidy_f22019_0_0), ]HTH, Bill. W. Michels, Ph.D. On Thu, Oct 3, 2019 at 11:59 AM Ana Marija <sokovic.anamarija at gmail.com> wrote:> > Hello, > > I have a dataframe (t1) with many columns, but the one I care about it this: > > unique(t1$sex_chromosome_aneuploidy_f22019_0_0) > [1] NA "Yes" > > it has these two values. > > I would like to remove from my dataframe t1 all rows which have "Yes" > in t1$sex_chromosome_aneuploidy_f22019_0_0 > > I tried selecting those rows with "Yes" via: > > t11=t1[t1$sex_chromosome_aneuploidy_f22019_0_0=="Yes",] > > but I got t11 which has the exact same number of rows as t1. > > If I do: > > table(t1$sex_chromosome_aneuploidy_f22019_0_0) > > Yes > 620 > > So there is for sure 620 rows which have "Yes". How to remove those > from my t1 data frame? > > Thanks > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, Then it's easier, is.na alone will do it. j <- is.na(t1$sex_chromosome_aneuploidy_f22019_0_0) t1[j, ] Hope this helps, Rui Barradas ?s 20:29 de 03/10/19, Ana Marija escreveu:> Hi Rui, > > sorry for confusion, I would only need to extract from my t1 dataframe > rows which have NA in sex_chromosome_aneuploidy_f22019_0_0 > in other words to REMOVE rows with "Yes" and to keep rows with NA. How > to do that? > > On Thu, Oct 3, 2019 at 2:26 PM Rui Barradas <ruipbarradas at sapo.pt> wrote: >> >> Hello, >> >> You have to use is.na to get the NA values. >> >> >> t1 <- data.frame(sex_chromosome_aneuploidy_f22019_0_0 = c(NA, "Yes"), >> other = 1:2) >> >> i <- t1$sex_chromosome_aneuploidy_f22019_0_0 == "Yes" & >> !is.na(t1$sex_chromosome_aneuploidy_f22019_0_0) >> i >> t1[i, ] >> >> >> Hope this helps, >> >> Rui Barradas >> >> ?s 19:58 de 03/10/19, Ana Marija escreveu: >>> Hello, >>> >>> I have a dataframe (t1) with many columns, but the one I care about it this: >>>> unique(t1$sex_chromosome_aneuploidy_f22019_0_0) >>> [1] NA "Yes" >>> >>> it has these two values. >>> >>> I would like to remove from my dataframe t1 all rows which have "Yes" >>> in t1$sex_chromosome_aneuploidy_f22019_0_0 >>> >>> I tried selecting those rows with "Yes" via: >>> >>> t11=t1[t1$sex_chromosome_aneuploidy_f22019_0_0=="Yes",] >>> >>> but I got t11 which has the exact same number of rows as t1. >>> >>> If I do: >>>> table(t1$sex_chromosome_aneuploidy_f22019_0_0) >>> >>> Yes >>> 620 >>> >>> So there is for sure 620 rows which have "Yes". How to remove those >>> from my t1 data frame? >>> >>> Thanks >>> Ana >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>>
Hello again, Sometimes it's better to create indices for each condition and then assemble them with logical operations as needed. i <- t1$sex_chromosome_aneuploidy_f22019_0_0 == "Yes" j <- is.na(t1$sex_chromosome_aneuploidy_f22019_0_0) t1[!i & j, ] j means is.na(.) !i means (.) != "Yes" Hope this helps, Rui Barradas ?s 21:21 de 03/10/19, Rui Barradas escreveu:> Hello, > > Then it's easier, is.na alone will do it. > > j <- is.na(t1$sex_chromosome_aneuploidy_f22019_0_0) > t1[j, ] > > > Hope this helps, > > Rui Barradas > > > ?s 20:29 de 03/10/19, Ana Marija escreveu: >> Hi Rui, >> >> sorry for confusion, I would only need to extract from my t1 dataframe >> rows which have NA in sex_chromosome_aneuploidy_f22019_0_0 >> in other words to REMOVE rows with "Yes" and to keep rows with NA. How >> to do that? >> >> On Thu, Oct 3, 2019 at 2:26 PM Rui Barradas <ruipbarradas at sapo.pt> wrote: >>> >>> Hello, >>> >>> You have to use is.na to get the NA values. >>> >>> >>> t1 <- data.frame(sex_chromosome_aneuploidy_f22019_0_0 = c(NA, "Yes"), >>> ?????????????????? other = 1:2) >>> >>> i <- t1$sex_chromosome_aneuploidy_f22019_0_0 == "Yes" & >>> !is.na(t1$sex_chromosome_aneuploidy_f22019_0_0) >>> i >>> t1[i, ] >>> >>> >>> Hope this helps, >>> >>> Rui Barradas >>> >>> ?s 19:58 de 03/10/19, Ana Marija escreveu: >>>> Hello, >>>> >>>> I have a dataframe (t1) with many columns, but the one I care about >>>> it this: >>>>> unique(t1$sex_chromosome_aneuploidy_f22019_0_0) >>>> [1] NA??? "Yes" >>>> >>>> it has these two values. >>>> >>>> I would like to remove from my dataframe t1 all rows which have "Yes" >>>> in t1$sex_chromosome_aneuploidy_f22019_0_0 >>>> >>>> I tried selecting those rows with "Yes" via: >>>> >>>> t11=t1[t1$sex_chromosome_aneuploidy_f22019_0_0=="Yes",] >>>> >>>> but I got t11 which has the exact same number of rows as t1. >>>> >>>> If I do: >>>>> table(t1$sex_chromosome_aneuploidy_f22019_0_0) >>>> >>>> Yes >>>> 620 >>>> >>>> So there is for sure 620 rows which have "Yes". How to remove those >>>> from my t1 data frame? >>>> >>>> Thanks >>>> Ana >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, On 10/3/19 11:58, Ana Marija wrote:> Hello, > > I have a dataframe (t1) with many columns, but the one I care about it this: >> unique(t1$sex_chromosome_aneuploidy_f22019_0_0) > [1] NA "Yes" > > it has these two values. > > I would like to remove from my dataframe t1 all rows which have "Yes" > in t1$sex_chromosome_aneuploidy_f22019_0_0 > > I tried selecting those rows with "Yes" via: > > t11=t1[t1$sex_chromosome_aneuploidy_f22019_0_0=="Yes",]It's important that you realize that instead of removing rows with "Yes" this actually keeps them.> > but I got t11 which has the exact same number of rows as t1.which should not be outrageously unexpected. After all it's not entirely impossible that when you selected the rows with "Yes" you selected them all.> > If I do: >> table(t1$sex_chromosome_aneuploidy_f22019_0_0) > > Yes > 620 > > So there is for sure 620 rows which have "Yes".This **seems** to indicate that all the rows contain "Yes". And this would explain why when you selected the rows with "Yes" you selected them all.> How to remove those > from my t1 data frame?Unfortunately, this is a situation where we cannot trust the appearances. Appearances: it **looks** like all the rows contain "Yes" and this seems to be confirmed by the fact that selecting the rows with "Yes" didn't drop any rows. The truth: the truth is that there are some rows that don't contain "Yes". However by default table() doesn't report counts for NAs so you need to explicitly ask for that: > table(t1$sex_chromosome_aneuploidy_f22019_0_0, useNA="always") Yes <NA> 620 111 So now you know how many rows to expect after removing those with "Yes". Another complication is that the == operator propagates NAs so it tends to return a subscript that is not safe to use for subsetting because it's contaminated with NAs. Other people have suggested that you use is.na(t1$sex_chromosome_aneuploidy_f22019_0_0) or other more complicated things (like t1$sex_chromosome_aneuploidy_f22019_0_0 != "Yes" & is.na(t1$sex_chromosome_aneuploidy_f22019_0_0)) to work around this. However the simplest and safest way to translate "compute the index of the rows that match string 'babar'" into R code is with: t1$sex_chromosome_aneuploidy_f22019_0_0 %in% "babar" Another advantage of using %in% is that you can have more than one string on the right. For example t1$sex_chromosome_aneuploidy_f22019_0_0 %in% c("babar", "foo") will produce an index that can be used to select the rows that match "babar" or "foo". To remove these rows, use !(t1$sex_chromosome_aneuploidy_f22019_0_0 %in% c("babar", "foo")) instead (parenthesis around the %in% operation highly recommended for readability). The bottom line is that %in% is almost always better than == for computing a subscript because it doesn't propagate NAs. Hope this helps, H.> > Thanks > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=-q949hHmNa2Zy6QlxHGK0kwN06YpOLpQaCPLdbT448o&s=hnmydGYEu22xzrlJku0qKP-I0n-HY-PrhTEttCmyC0g&e> PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=-q949hHmNa2Zy6QlxHGK0kwN06YpOLpQaCPLdbT448o&s=m_46Zit63H4OkJrgOFPzWqqdpgHNvW8B5jC0Rw9O1h4&e> and provide commented, minimal, self-contained, reproducible code. >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
I think the problem may lie in your understanding of what "==" does with NA and/or what "[]" does with NA.> x <- c(NA, "Yes") > x == "Yes"[1] NA TRUE Since you say you DON'T want the rows with "Yes", you just want x[is.na(x)] or in your case t11 <- t1[is.na(t1$sex_chromosome_aneuploidy_f22019_0_0),] or if there could be other values than "Yes" that you want to keep, is.definitely <- function (x, y) { !is.na(x) & !is.na(y) & x == y } t11 <- t1[!is.definitely(t1$sex_chromosome_aneuploidy_f22019_0_0, "Yes"),] On Fri, 4 Oct 2019 at 07:59, Ana Marija <sokovic.anamarija at gmail.com> wrote:> > Hello, > > I have a dataframe (t1) with many columns, but the one I care about it this: > > unique(t1$sex_chromosome_aneuploidy_f22019_0_0) > [1] NA "Yes" > > it has these two values. > > I would like to remove from my dataframe t1 all rows which have "Yes" > in t1$sex_chromosome_aneuploidy_f22019_0_0 > > I tried selecting those rows with "Yes" via: > > t11=t1[t1$sex_chromosome_aneuploidy_f22019_0_0=="Yes",] > > but I got t11 which has the exact same number of rows as t1. > > If I do: > > table(t1$sex_chromosome_aneuploidy_f22019_0_0) > > Yes > 620 > > So there is for sure 620 rows which have "Yes". How to remove those > from my t1 data frame? > > Thanks > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.