John Kane
2007-Nov-01 16:44 UTC
[R] subsetting problem with multiple criteria: Works in some but not all cases.
I am trying to compare some word lists which have an associate set of numbers. I want to compare word list aa with bb and find only those words which are unique to bb, then compare bb with cc, etc. I thought that I should be able to do this by using setdiff to get the unique words and then subset the data frame to get the unique names and corresponding numbers but I am misunderstanding something. When I run the code below a) I get lots of warning and b) I get the correct results for 4 of the 5 comparisons. However the comparison of three with four (cc,dd) gives me an empty subset. Can anyone point out my error or suggest a better way to do this? Thanks =========================================================================== mydata = data.frame(aa = Cs(cat, dog, horse, cow), bb = c("mouse", "dog", "cow", "pigeon"), cc =c("emu", "rat", "crow", "cow"), dd = c("cow", "camel", "manatee", "parrot") , ee = c( "coat", "hat", "dog", "camel") , ff = c("knife","dog", "cow", "pigeon"), ann = c(1,2,3,4), bnn = c(5,6,7,8), cnn = c(9,10,11,12), dnn = c(13,14,15,16), enn = c(17,18,19,20), fnn = c(21,22,23,24)) wordnames <- c("word", "number") word.list <- rep(vector("list", 1), 5) for(j in 1:5) { lone.word <- setdiff(mydata[,j+1],mydata[,j]); lone.word matching <- subset(mydata[,c(j+1,j+7)], mydata[,j+1]==lone.word); matching word.list[[j]] <- matching; names(word.list[[j]])<- wordnames } word.list ============================================================================R version 2.6.0 (2007-10-03) i386-pc-mingw32 locale: LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.4-2 gdata_2.3.1 loaded via a namespace (and not attached): [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 lattice_0.17-1 R version 2.6.0 (2007-10-03) i386-pc-mingw32 locale: LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.4-2 gdata_2.3.1 loaded via a namespace (and not attached): [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 lattice_0.17-1
jim holtman
2007-Nov-01 20:27 UTC
[R] subsetting problem with multiple criteria: Works in some but not all cases.
I think the problem is with your use of "==" instead of "%in%"; try matching <- subset(mydata[,c(j+1,j+7)], mydata[,j+1] %in% lone.word) On 11/1/07, John Kane <jrkrideau at yahoo.ca> wrote:> I am trying to compare some word lists which have an > associate set of numbers. I want to compare word list > aa with bb and find only those words which are > unique to bb, then compare bb with cc, etc. > > I thought that I should be able to do this by using > setdiff to get the unique words and then subset the > data frame to get the unique names and corresponding > numbers but I am misunderstanding something. > > When I run the code below a) I get lots of warning and > b) I get the correct results for 4 of the 5 > comparisons. However the comparison of three with > four (cc,dd) gives me an empty subset. > > Can anyone point out my error or suggest a better way > to do this? > Thanks > > ===========================================================================> > mydata = data.frame(aa = Cs(cat, dog, horse, cow), > bb = c("mouse", "dog", "cow", "pigeon"), > cc =c("emu", "rat", "crow", "cow"), > dd = c("cow", "camel", "manatee", "parrot") , > ee = c( "coat", "hat", "dog", "camel") , > ff = c("knife","dog", "cow", "pigeon"), > ann = c(1,2,3,4), > bnn = c(5,6,7,8), > cnn = c(9,10,11,12), > dnn = c(13,14,15,16), > enn = c(17,18,19,20), > fnn = c(21,22,23,24)) > > wordnames <- c("word", "number") > word.list <- rep(vector("list", 1), 5) > > for(j in 1:5) { > lone.word <- setdiff(mydata[,j+1],mydata[,j]); > lone.word > matching <- subset(mydata[,c(j+1,j+7)], > mydata[,j+1]==lone.word); matching > word.list[[j]] <- matching; names(word.list[[j]])<- > wordnames > } > word.list > > ============================================================================> R version 2.6.0 (2007-10-03) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] Hmisc_3.4-2 gdata_2.3.1 > > loaded via a namespace (and not attached): > [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 > lattice_0.17-1 > > > R version 2.6.0 (2007-10-03) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] Hmisc_3.4-2 gdata_2.3.1 > > loaded via a namespace (and not attached): > [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 > lattice_0.17-1 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
John Kane
2007-Nov-02 14:54 UTC
[R] subsetting problem with multiple criteria: Works in some but not all cases.
Thank you Jim. That seems to work perfectly. I had looked at %in% and apparently misunderstood what it would do. Is there any place where I can read on the various %XX% functions? --- jim holtman <jholtman at gmail.com> wrote:> I think the problem is with your use of "==" instead > of "%in%"; try > > matching <- subset(mydata[,c(j+1,j+7)], mydata[,j+1] > %in% lone.word) > > > > On 11/1/07, John Kane <jrkrideau at yahoo.ca> wrote: > > I am trying to compare some word lists which have > an > > associate set of numbers. I want to compare word > list > > aa with bb and find only those words which are > > unique to bb, then compare bb with cc, etc. > > > > I thought that I should be able to do this by > using > > setdiff to get the unique words and then subset > the > > data frame to get the unique names and > corresponding > > numbers but I am misunderstanding something. > > > > When I run the code below a) I get lots of warning > and > > b) I get the correct results for 4 of the 5 > > comparisons. However the comparison of three with > > four (cc,dd) gives me an empty subset. > > > > Can anyone point out my error or suggest a better > way > > to do this? > > Thanks > > > > >===========================================================================> >> > mydata = data.frame(aa = Cs(cat, dog, horse, > cow), > > bb = c("mouse", "dog", "cow", "pigeon"), > > cc =c("emu", "rat", "crow", "cow"), > > dd = c("cow", "camel", "manatee", "parrot") , > > ee = c( "coat", "hat", "dog", "camel") , > > ff = c("knife","dog", "cow", "pigeon"), > > ann = c(1,2,3,4), > > bnn = c(5,6,7,8), > > cnn = c(9,10,11,12), > > dnn = c(13,14,15,16), > > enn = c(17,18,19,20), > > fnn = c(21,22,23,24)) > > > > wordnames <- c("word", "number") > > word.list <- rep(vector("list", 1), 5) > > > > for(j in 1:5) { > > lone.word <- setdiff(mydata[,j+1],mydata[,j]); > > lone.word > > matching <- subset(mydata[,c(j+1,j+7)], > > mydata[,j+1]==lone.word); matching > > word.list[[j]] <- matching; > names(word.list[[j]])<- > > wordnames > > } > > word.list > > > > >============================================================================> > R version 2.6.0 (2007-10-03)> > i386-pc-mingw32 > > > > locale: > > >LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252> > > > attached base packages: > > [1] stats graphics grDevices utils > datasets > > methods base > > > > other attached packages: > > [1] Hmisc_3.4-2 gdata_2.3.1 > > > > loaded via a namespace (and not attached): > > [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 > > lattice_0.17-1 > > > > > > R version 2.6.0 (2007-10-03) > > i386-pc-mingw32 > > > > locale: > > >LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252> > > > attached base packages: > > [1] stats graphics grDevices utils > datasets > > methods base > > > > other attached packages: > > [1] Hmisc_3.4-2 gdata_2.3.1 > > > > loaded via a namespace (and not attached): > > [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 > > lattice_0.17-1 > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? >
jim holtman
2007-Nov-02 21:05 UTC
[R] subsetting problem with multiple criteria: Works in some but not all cases.
You can use apropos to find then and then list out their help pages:> apropos("%")[1] "%%" "%*%" "%/%" "%in%" "%o%" "%x%" On 11/2/07, John Kane <jrkrideau at yahoo.ca> wrote:> Thank you Jim. That seems to work perfectly. > > I had looked at %in% and apparently misunderstood what > it would do. Is there any place where I can read on > the various %XX% functions? > > --- jim holtman <jholtman at gmail.com> wrote: > > > I think the problem is with your use of "==" instead > > of "%in%"; try > > > > matching <- subset(mydata[,c(j+1,j+7)], mydata[,j+1] > > %in% lone.word) > > > > > > > > On 11/1/07, John Kane <jrkrideau at yahoo.ca> wrote: > > > I am trying to compare some word lists which have > > an > > > associate set of numbers. I want to compare word > > list > > > aa with bb and find only those words which are > > > unique to bb, then compare bb with cc, etc. > > > > > > I thought that I should be able to do this by > > using > > > setdiff to get the unique words and then subset > > the > > > data frame to get the unique names and > > corresponding > > > numbers but I am misunderstanding something. > > > > > > When I run the code below a) I get lots of warning > > and > > > b) I get the correct results for 4 of the 5 > > > comparisons. However the comparison of three with > > > four (cc,dd) gives me an empty subset. > > > > > > Can anyone point out my error or suggest a better > > way > > > to do this? > > > Thanks > > > > > > > > > ===========================================================================> > > > > > mydata = data.frame(aa = Cs(cat, dog, horse, > > cow), > > > bb = c("mouse", "dog", "cow", "pigeon"), > > > cc =c("emu", "rat", "crow", "cow"), > > > dd = c("cow", "camel", "manatee", "parrot") , > > > ee = c( "coat", "hat", "dog", "camel") , > > > ff = c("knife","dog", "cow", "pigeon"), > > > ann = c(1,2,3,4), > > > bnn = c(5,6,7,8), > > > cnn = c(9,10,11,12), > > > dnn = c(13,14,15,16), > > > enn = c(17,18,19,20), > > > fnn = c(21,22,23,24)) > > > > > > wordnames <- c("word", "number") > > > word.list <- rep(vector("list", 1), 5) > > > > > > for(j in 1:5) { > > > lone.word <- setdiff(mydata[,j+1],mydata[,j]); > > > lone.word > > > matching <- subset(mydata[,c(j+1,j+7)], > > > mydata[,j+1]==lone.word); matching > > > word.list[[j]] <- matching; > > names(word.list[[j]])<- > > > wordnames > > > } > > > word.list > > > > > > > > > ============================================================================> > > R version 2.6.0 (2007-10-03) > > > i386-pc-mingw32 > > > > > > locale: > > > > > > LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252 > > > > > > attached base packages: > > > [1] stats graphics grDevices utils > > datasets > > > methods base > > > > > > other attached packages: > > > [1] Hmisc_3.4-2 gdata_2.3.1 > > > > > > loaded via a namespace (and not attached): > > > [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 > > > lattice_0.17-1 > > > > > > > > > R version 2.6.0 (2007-10-03) > > > i386-pc-mingw32 > > > > > > locale: > > > > > > LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252 > > > > > > attached base packages: > > > [1] stats graphics grDevices utils > > datasets > > > methods base > > > > > > other attached packages: > > > [1] Hmisc_3.4-2 gdata_2.3.1 > > > > > > loaded via a namespace (and not attached): > > > [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 > > > lattice_0.17-1 > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, > > reproducible code. > > > > > > > > > -- > > Jim Holtman > > Cincinnati, OH > > +1 513 646 9390 > > > > What is the problem you are trying to solve? > > > > > > Ask a question on any topic and get answers from real people. Go to Yahoo! Answers and share what you know at http://ca.answers.yahoo.com >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
John Kane
2007-Nov-05 16:00 UTC
[R] subsetting problem with multiple criteria: Works in some but not all cases.
Excellent, thank you . --- jim holtman <jholtman at gmail.com> wrote:> You can use apropos to find then and then list out > their help pages: > > > apropos("%") > [1] "%%" "%*%" "%/%" "%in%" "%o%" "%x%" > > > On 11/2/07, John Kane <jrkrideau at yahoo.ca> wrote: > > Thank you Jim. That seems to work perfectly. > > > > I had looked at %in% and apparently misunderstood > what > > it would do. Is there any place where I can read > on > > the various %XX% functions? > > > > --- jim holtman <jholtman at gmail.com> wrote: > > > > > I think the problem is with your use of "==" > instead > > > of "%in%"; try > > > > > > matching <- subset(mydata[,c(j+1,j+7)], > mydata[,j+1] > > > %in% lone.word) > > > > > > > > > > > > On 11/1/07, John Kane <jrkrideau at yahoo.ca> > wrote: > > > > I am trying to compare some word lists which > have > > > an > > > > associate set of numbers. I want to compare > word > > > list > > > > aa with bb and find only those words which are > > > > unique to bb, then compare bb with cc, etc. > > > > > > > > I thought that I should be able to do this by > > > using > > > > setdiff to get the unique words and then > subset > > > the > > > > data frame to get the unique names and > > > corresponding > > > > numbers but I am misunderstanding something. > > > > > > > > When I run the code below a) I get lots of > warning > > > and > > > > b) I get the correct results for 4 of the 5 > > > > comparisons. However the comparison of three > with > > > > four (cc,dd) gives me an empty subset. > > > > > > > > Can anyone point out my error or suggest a > better > > > way > > > > to do this? > > > > Thanks > > > > > > > > > > > > > >===========================================================================> > > >> > > > mydata = data.frame(aa = Cs(cat, dog, horse, > > > cow), > > > > bb = c("mouse", "dog", "cow", "pigeon"), > > > > cc =c("emu", "rat", "crow", "cow"), > > > > dd = c("cow", "camel", "manatee", "parrot") , > > > > ee = c( "coat", "hat", "dog", "camel") , > > > > ff = c("knife","dog", "cow", "pigeon"), > > > > ann = c(1,2,3,4), > > > > bnn = c(5,6,7,8), > > > > cnn = c(9,10,11,12), > > > > dnn = c(13,14,15,16), > > > > enn = c(17,18,19,20), > > > > fnn = c(21,22,23,24)) > > > > > > > > wordnames <- c("word", "number") > > > > word.list <- rep(vector("list", 1), 5) > > > > > > > > for(j in 1:5) { > > > > lone.word <- setdiff(mydata[,j+1],mydata[,j]); > > > > lone.word > > > > matching <- subset(mydata[,c(j+1,j+7)], > > > > mydata[,j+1]==lone.word); matching > > > > word.list[[j]] <- matching; > > > names(word.list[[j]])<- > > > > wordnames > > > > } > > > > word.list > > > > > > > > > > > > > >============================================================================> > > > R version 2.6.0 (2007-10-03)> > > > i386-pc-mingw32 > > > > > > > > locale: > > > > > > > > > >LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252> > > > > > > > attached base packages: > > > > [1] stats graphics grDevices utils > > > datasets > > > > methods base > > > > > > > > other attached packages: > > > > [1] Hmisc_3.4-2 gdata_2.3.1 > > > > > > > > loaded via a namespace (and not attached): > > > > [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 > > > > lattice_0.17-1 > > > > > > > > > > > > R version 2.6.0 (2007-10-03) > > > > i386-pc-mingw32 > > > > > > > > locale: > > > > > > > > > >LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252> > > > > > > > attached base packages: > > > > [1] stats graphics grDevices utils > > > datasets > > > > methods base > > > > > > > > other attached packages: > > > > [1] Hmisc_3.4-2 gdata_2.3.1 > > > > > > > > loaded via a namespace (and not attached): > > > > [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 > > > > lattice_0.17-1 > > > > > > > > ______________________________________________ > > > > R-help at r-project.org mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, > self-contained, > > > reproducible code. > > > > > > > > > > > > > -- > > > Jim Holtman > > > Cincinnati, OH > > > +1 513 646 9390 > > > > > > What is the problem you are trying to solve? > > > > > > > > > > > Ask a question on any topic and get answers> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? >