Knut Krueger
2018-Oct-22 14:37 UTC
[R] match() question or needle haystack problem for a data.frame
Hi to all I would like to reduce the "Mydata" to rows, only if Mydata$Data1 are in needles needles =c(14390, 14391, 14392, 14427, 14428, 14429, 14430, 14431, 14432, 14433, 14434, 14435, 14436, 14437, 14439, 14440, 14441, 15195, 15196, 15197, 15198, 15199, 15200, 15201, 15202, 15203, 15204, 15205, 15206, 15207, 15208, 15209, 17615, 17616, 17617, 17618, 17619, 17620, 17621, 17622, 17623, 17624, 17625, 17626, 17627, 17628, 17629, 17630, 17631, 17679, 17680, 17681, 17682, 17683, 17823, 17824, 17825, 17826, 17827, 17828, 17829, 17830, 17831, 17862, 17863, 17864, 17865, 17866, 17867, 17868, 17869, 17870, 17871, 17872, 17873, 17874, 17875, 17876, 17877, 17878, 17879, 17880, 17881, 17882, 17883, 19255, 19256, 19257, 19258, 21289, 21290, 21291, 21292, 22890, 22891, 22892, 22893, 22894, 22895, 22896, 22897, 22898, 22899, 22900, 22901, 22902, 40428, 40429, 40430, 40431, 40432, 40433, 40434, 40435, 40436, 40437) Haystack =c(14390, 14391, 14392, 14427, 14428, 14429, 14430, 14431, 14432, 14433, 14434, 14435, 14436, 14437, 14439, 14440, 14441, 15187, 15188, 15195, 15196, 15197, 15198, 15199, 15200, 15201, 15202, 15203, 15204, 15205, 15206, 15207, 15208, 15209, 16717, 16718, 17615, 17616, 17617, 17618, 17619, 17620, 17621, 17622, 17623, 17624, 17625, 17626, 17627, 17628, 17629, 17630, 17631, 17679, 17680, 17681, 17682, 17683, 17817, 17818, 17823, 17824, 17825, 17826, 17827, 17828, 17829, 17830, 17831, 17862, 17863, 17864, 17865, 17866, 17867, 17868, 17869, 17870, 17871, 17872, 17873, 17874, 17875, 17876, 17877, 17878, 17879, 17880, 17881, 17882, 17883, 17886, 19255, 19256, 19257, 19258, 21289, 21290, 21291, 21292, 22890, 22891, 22892, 22893, 22894, 22895, 22896, 22897, 22898, 22899, 22900, 22901, 22902, 40428, 40429, 40430, 40431, 40432, 40433, 40434, 40435, 40436, 40437, 40710, 40711, 49127, 49128, 52768) Mydata =data.frame (DATA1=Haystack, Data2=c(1:length(Haystack))) match(Mydata$DATA1, needles, nomatch=NA) does find all data which are in needle - the others are set to the nomatch value. But I don not find out how to reduce the data.frame - maybe match() is not helpful for that. Kind regards Knut
Bert Gunter
2018-Oct-22 14:47 UTC
[R] match() question or needle haystack problem for a data.frame
Re-read ?match and note the examples for %in% -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Oct 22, 2018 at 7:38 AM Knut Krueger <rhelp at krueger-family.de> wrote:> > Hi to all > > I would like to reduce the "Mydata" to rows, only if Mydata$Data1 are in > needles > > > > needles =c(14390, 14391, 14392, 14427, 14428, 14429, 14430, 14431, > 14432, 14433, 14434, 14435, 14436, 14437, 14439, 14440, 14441, 15195, > 15196, 15197, 15198, 15199, 15200, 15201, 15202, 15203, 15204, 15205, > 15206, 15207, 15208, 15209, 17615, 17616, 17617, 17618, 17619, 17620, > 17621, 17622, 17623, 17624, 17625, 17626, 17627, 17628, 17629, 17630, > 17631, 17679, 17680, 17681, 17682, 17683, 17823, 17824, 17825, 17826, > 17827, 17828, 17829, 17830, 17831, 17862, 17863, 17864, 17865, 17866, > 17867, 17868, 17869, 17870, 17871, 17872, 17873, 17874, 17875, 17876, > 17877, 17878, 17879, 17880, 17881, 17882, 17883, 19255, 19256, 19257, > 19258, 21289, 21290, 21291, 21292, 22890, 22891, 22892, 22893, 22894, > 22895, 22896, 22897, 22898, 22899, 22900, 22901, 22902, 40428, 40429, > 40430, 40431, 40432, 40433, 40434, 40435, 40436, 40437) > > Haystack =c(14390, 14391, 14392, 14427, 14428, 14429, 14430, 14431, > 14432, 14433, 14434, 14435, 14436, 14437, 14439, 14440, 14441, 15187, > 15188, 15195, 15196, 15197, 15198, 15199, 15200, 15201, 15202, 15203, > 15204, 15205, 15206, 15207, 15208, 15209, 16717, 16718, 17615, 17616, > 17617, 17618, 17619, 17620, 17621, 17622, 17623, 17624, 17625, 17626, > 17627, 17628, 17629, 17630, 17631, 17679, 17680, 17681, 17682, 17683, > 17817, 17818, 17823, 17824, 17825, 17826, 17827, 17828, 17829, 17830, > 17831, 17862, 17863, 17864, 17865, 17866, 17867, 17868, 17869, 17870, > 17871, 17872, 17873, 17874, 17875, 17876, 17877, 17878, 17879, 17880, > 17881, 17882, 17883, 17886, 19255, 19256, 19257, 19258, 21289, 21290, > 21291, 21292, 22890, 22891, 22892, 22893, 22894, 22895, 22896, 22897, > 22898, 22899, 22900, 22901, 22902, 40428, 40429, 40430, 40431, 40432, > 40433, 40434, 40435, 40436, 40437, 40710, 40711, 49127, 49128, 52768) > > Mydata =data.frame (DATA1=Haystack, Data2=c(1:length(Haystack))) > > > > match(Mydata$DATA1, needles, nomatch=NA) does find all data which are in > needle - the others are set to the nomatch value. > > But I don not find out how to reduce the data.frame - maybe match() is > not helpful for that. > > Kind regards Knut > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Eric Berger
2018-Oct-22 15:01 UTC
[R] match() question or needle haystack problem for a data.frame
Hi Knut, You are almost done.> v <- match(Mydata$DATA1, needles, nomatch=NA) > found <- Mydata[ !is.na(v), ] > missing <- Mdata[ is.na(v), ]HTH, Eric On Mon, Oct 22, 2018 at 5:51 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> Re-read ?match and note the examples for %in% > > -- Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Oct 22, 2018 at 7:38 AM Knut Krueger <rhelp at krueger-family.de> > wrote: > > > > > Hi to all > > > > I would like to reduce the "Mydata" to rows, only if Mydata$Data1 are in > > needles > > > > > > > > needles =c(14390, 14391, 14392, 14427, 14428, 14429, 14430, 14431, > > 14432, 14433, 14434, 14435, 14436, 14437, 14439, 14440, 14441, 15195, > > 15196, 15197, 15198, 15199, 15200, 15201, 15202, 15203, 15204, 15205, > > 15206, 15207, 15208, 15209, 17615, 17616, 17617, 17618, 17619, 17620, > > 17621, 17622, 17623, 17624, 17625, 17626, 17627, 17628, 17629, 17630, > > 17631, 17679, 17680, 17681, 17682, 17683, 17823, 17824, 17825, 17826, > > 17827, 17828, 17829, 17830, 17831, 17862, 17863, 17864, 17865, 17866, > > 17867, 17868, 17869, 17870, 17871, 17872, 17873, 17874, 17875, 17876, > > 17877, 17878, 17879, 17880, 17881, 17882, 17883, 19255, 19256, 19257, > > 19258, 21289, 21290, 21291, 21292, 22890, 22891, 22892, 22893, 22894, > > 22895, 22896, 22897, 22898, 22899, 22900, 22901, 22902, 40428, 40429, > > 40430, 40431, 40432, 40433, 40434, 40435, 40436, 40437) > > > > Haystack =c(14390, 14391, 14392, 14427, 14428, 14429, 14430, 14431, > > 14432, 14433, 14434, 14435, 14436, 14437, 14439, 14440, 14441, 15187, > > 15188, 15195, 15196, 15197, 15198, 15199, 15200, 15201, 15202, 15203, > > 15204, 15205, 15206, 15207, 15208, 15209, 16717, 16718, 17615, 17616, > > 17617, 17618, 17619, 17620, 17621, 17622, 17623, 17624, 17625, 17626, > > 17627, 17628, 17629, 17630, 17631, 17679, 17680, 17681, 17682, 17683, > > 17817, 17818, 17823, 17824, 17825, 17826, 17827, 17828, 17829, 17830, > > 17831, 17862, 17863, 17864, 17865, 17866, 17867, 17868, 17869, 17870, > > 17871, 17872, 17873, 17874, 17875, 17876, 17877, 17878, 17879, 17880, > > 17881, 17882, 17883, 17886, 19255, 19256, 19257, 19258, 21289, 21290, > > 21291, 21292, 22890, 22891, 22892, 22893, 22894, 22895, 22896, 22897, > > 22898, 22899, 22900, 22901, 22902, 40428, 40429, 40430, 40431, 40432, > > 40433, 40434, 40435, 40436, 40437, 40710, 40711, 49127, 49128, 52768) > > > > Mydata =data.frame (DATA1=Haystack, Data2=c(1:length(Haystack))) > > > > > > > > match(Mydata$DATA1, needles, nomatch=NA) does find all data which are in > > needle - the others are set to the nomatch value. > > > > But I don not find out how to reduce the data.frame - maybe match() is > > not helpful for that. > > > > Kind regards Knut > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]