Hello, I have a data frame with one column:> removeV1 1 ABAFT_g_4RWG569_BI_SNP_A10_35096 2 ABAFT_g_4RWG569_BI_SNP_B12_35130 3 ABAFT_g_4RWG569_BI_SNP_E09_35088 4 ABAFT_g_4RWG569_BI_SNP_E12_35136 5 ABAFT_g_4RWG569_BI_SNP_F11_35122 6 ABAFT_g_4RWG569_BI_SNP_F12_35138 7 ABAFT_g_4RWG569_BI_SNP_G07_35060 8 ABAFT_g_4RWG569_BI_SNP_G12_35140 I want to remove these 8 entries from remove data frame from this vector that looks like this:> head(celFiles)[1] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL" [2] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL" [3] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL" [4] "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL" [5] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL" [6] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL" ... I tried doing this: b= celFiles[!basename(celFiles) %in% as.character(remove$V1)] but none of the 8th entries in "remove" data frame have been removed. Please advise, Ana
Rui Barradas
2020-Oct-21  21:41 UTC
[R] how do I remove entries in data frame from a vector
Hello,
This is probably because basename keeps the file extension, try instead
filename <- sub("(^[^\\.]*)\\..+$", "\\1",
basename(celFiles))
celFiles[!filename %in% as.character(remove$V1)]
Hope this helps,
Rui Barradas
?s 22:15 de 21/10/20, Ana Marija escreveu:> Hello,
> 
> I have a data frame with one column:
> 
>> remove
> 
>                                  V1
> 
> 1 ABAFT_g_4RWG569_BI_SNP_A10_35096
> 2 ABAFT_g_4RWG569_BI_SNP_B12_35130
> 3 ABAFT_g_4RWG569_BI_SNP_E09_35088
> 4 ABAFT_g_4RWG569_BI_SNP_E12_35136
> 5 ABAFT_g_4RWG569_BI_SNP_F11_35122
> 6 ABAFT_g_4RWG569_BI_SNP_F12_35138
> 7 ABAFT_g_4RWG569_BI_SNP_G07_35060
> 8 ABAFT_g_4RWG569_BI_SNP_G12_35140
> 
> I want to remove these 8 entries from remove data frame from this
> vector that looks like this:
> 
>> head(celFiles)
> 
> [1]
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL"
> [2]
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL"
> 
> [3]
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL"
> 
> [4]
"GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL"
> 
> [5]
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL"
> 
> [6]
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL"
> ...
> 
> I tried doing this:
> 
> b= celFiles[!basename(celFiles) %in% as.character(remove$V1)]
> 
> but none of the 8th entries in "remove" data frame have been
removed.
> 
> Please advise,
> Ana
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Rui Barradas
2020-Oct-21  21:47 UTC
[R] how do I remove entries in data frame from a vector
Hello, To remove the file extension it's much easier to use base R filename <- tools::file_path_sans_ext(basename(celFiles)) Hope this helps, Rui Barradas ?s 22:41 de 21/10/20, Rui Barradas escreveu:> Hello, > > This is probably because basename keeps the file extension, try instead > > > filename <- sub("(^[^\\.]*)\\..+$", "\\1", basename(celFiles)) > celFiles[!filename %in% as.character(remove$V1)] > > > Hope this helps, > > Rui Barradas > > ?s 22:15 de 21/10/20, Ana Marija escreveu: >> Hello, >> >> I have a data frame with one column: >> >>> remove >> >> ???????????????????????????????? V1 >> >> 1 ABAFT_g_4RWG569_BI_SNP_A10_35096 >> 2 ABAFT_g_4RWG569_BI_SNP_B12_35130 >> 3 ABAFT_g_4RWG569_BI_SNP_E09_35088 >> 4 ABAFT_g_4RWG569_BI_SNP_E12_35136 >> 5 ABAFT_g_4RWG569_BI_SNP_F11_35122 >> 6 ABAFT_g_4RWG569_BI_SNP_F12_35138 >> 7 ABAFT_g_4RWG569_BI_SNP_G07_35060 >> 8 ABAFT_g_4RWG569_BI_SNP_G12_35140 >> >> I want to remove these 8 entries from remove data frame from this >> vector that looks like this: >> >>> head(celFiles) >> >> [1] >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL" >> >> [2] >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL" >> >> >> [3] >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL" >> >> >> [4] >> "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL" >> >> >> [5] >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL" >> >> >> [6] >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL" >> >> ... >> >> I tried doing this: >> >> b= celFiles[!basename(celFiles) %in% as.character(remove$V1)] >> >> but none of the 8th entries in "remove" data frame have been removed. >> >> Please advise, >> Ana >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Wed, 21 Oct 2020 16:15:22 -0500 Ana Marija <sokovic.anamarija at gmail.com> wrote:> Hello, > > I have a data frame with one column: > > > remove > > V1 > > 1 ABAFT_g_4RWG569_BI_SNP_A10_35096 > 2 ABAFT_g_4RWG569_BI_SNP_B12_35130 > 3 ABAFT_g_4RWG569_BI_SNP_E09_35088 > 4 ABAFT_g_4RWG569_BI_SNP_E12_35136 > 5 ABAFT_g_4RWG569_BI_SNP_F11_35122 > 6 ABAFT_g_4RWG569_BI_SNP_F12_35138 > 7 ABAFT_g_4RWG569_BI_SNP_G07_35060 > 8 ABAFT_g_4RWG569_BI_SNP_G12_35140 > > I want to remove these 8 entries from remove data frame from this > vector that looks like this: > > > head(celFiles) > > [1] > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL" > [2] > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL" > > [3] > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL" > > [4] > "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL" > > [5] > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL" > > [6] > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL" > ... > > I tried doing this: > > b= celFiles[!basename(celFiles) %in% as.character(remove$V1)] > > but none of the 8th entries in "remove" data frame have been removed. > > Please advise, > AnaI would advise you to *look* at basename(celFiles)!!! The entries end in ".CEL"; the names in remove$V1 do not. So %in% finds no matches. Perhaps: b <- celFiles[!basename(celFiles) %in% paste0(as.character(remove$V1),".CEL")] Note that, for the data that you have presented, none of the entries of celFiles "match up" with "remove" so it is *still* the case that (for the data shown) none of the entries will be removed. So your example was bad. cheers, Rolf Turner -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Makes sense, thank you! On Wed, 21 Oct 2020 at 17:46, Rolf Turner <r.turner at auckland.ac.nz> wrote:> > On Wed, 21 Oct 2020 16:15:22 -0500 > Ana Marija <sokovic.anamarija at gmail.com> wrote: > > > Hello, > > > > I have a data frame with one column: > > > > > remove > > > > V1 > > > > 1 ABAFT_g_4RWG569_BI_SNP_A10_35096 > > 2 ABAFT_g_4RWG569_BI_SNP_B12_35130 > > 3 ABAFT_g_4RWG569_BI_SNP_E09_35088 > > 4 ABAFT_g_4RWG569_BI_SNP_E12_35136 > > 5 ABAFT_g_4RWG569_BI_SNP_F11_35122 > > 6 ABAFT_g_4RWG569_BI_SNP_F12_35138 > > 7 ABAFT_g_4RWG569_BI_SNP_G07_35060 > > 8 ABAFT_g_4RWG569_BI_SNP_G12_35140 > > > > I want to remove these 8 entries from remove data frame from this > > vector that looks like this: > > > > > head(celFiles) > > > > [1] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL" > > [2] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL" > > > > [3] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL" > > > > [4] > > > "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL" > > > > [5] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL" > > > > [6] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL" > > ... > > > > I tried doing this: > > > > b= celFiles[!basename(celFiles) %in% as.character(remove$V1)] > > > > but none of the 8th entries in "remove" data frame have been removed. > > > > Please advise, > > Ana > > I would advise you to *look* at basename(celFiles)!!! > > The entries end in ".CEL"; the names in remove$V1 do not. So %in% > finds no matches. Perhaps: > > b <- celFiles[!basename(celFiles) %in% > paste0(as.character(remove$V1),".CEL")] > > Note that, for the data that you have presented, none of the entries of > celFiles "match up" with "remove" so it is *still* the case that (for > the data shown) none of the entries will be removed. So your example > was bad. > > cheers, > > Rolf Turner > > -- > Honorary Research Fellow > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > >[[alternative HTML version deleted]]