thr3ads.net - R help - [R] subsetting problem with multiple criteria: Works in some but not all cases. [Nov 2007]

If this information is useful, please help other people find it:
Share via:

John Kane

2007-Nov-01 16:44 UTC

[R] subsetting problem with multiple criteria: Works in some but not all cases.

I am trying to compare some word lists which have an
associate set of numbers. I want to compare word list
aa with bb and find only those words which are
unique to bb, then compare bb with cc, etc.

I thought that I should be able to do this by using
setdiff to get the unique words and then subset the
data frame to get the unique names and corresponding
numbers but I am misunderstanding something.

When I run the code below a) I get lots of warning and
b) I get the correct results for 4 of the 5
comparisons. However the comparison of  three with
four (cc,dd) gives me an empty subset.

Can anyone point out my error or suggest a better way
to do this?
Thanks

===========================================================================
mydata  = data.frame(aa = Cs(cat, dog, horse, cow),
bb = c("mouse", "dog", "cow", "pigeon"),
cc  =c("emu", "rat", "crow", "cow"),
dd = c("cow", "camel", "manatee",
"parrot") ,
ee = c( "coat", "hat", "dog", "camel") ,
ff = c("knife","dog", "cow", "pigeon"),
ann = c(1,2,3,4),
bnn = c(5,6,7,8),
cnn = c(9,10,11,12),
dnn = c(13,14,15,16),
enn = c(17,18,19,20),
fnn = c(21,22,23,24))

wordnames <- c("word", "number")
word.list  <- rep(vector("list", 1), 5)

for(j in 1:5) {
lone.word <- setdiff(mydata[,j+1],mydata[,j]);
lone.word
matching <- subset(mydata[,c(j+1,j+7)],
mydata[,j+1]==lone.word); matching
word.list[[j]] <- matching; names(word.list[[j]])<-
wordnames
}
word.list

============================================================================R
version 2.6.0 (2007-10-03)
i386-pc-mingw32

locale:
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
methods   base

other attached packages:
[1] Hmisc_3.4-2 gdata_2.3.1

loaded via a namespace (and not attached):
[1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0  
lattice_0.17-1


R version 2.6.0 (2007-10-03)
i386-pc-mingw32

locale:
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
methods   base

other attached packages:
[1] Hmisc_3.4-2 gdata_2.3.1

loaded via a namespace (and not attached):
[1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0  
lattice_0.17-1

jim holtman

2007-Nov-01 20:27 UTC

head link

[R] subsetting problem with multiple criteria: Works in some but not all cases.

I think the problem is with your use of "==" instead of
"%in%"; try

matching <- subset(mydata[,c(j+1,j+7)], mydata[,j+1] %in% lone.word)



On 11/1/07, John Kane <jrkrideau at yahoo.ca>
wrote:> I am trying to compare some word lists which have an
> associate set of numbers. I want to compare word list
> aa with bb and find only those words which are
> unique to bb, then compare bb with cc, etc.
>
> I thought that I should be able to do this by using
> setdiff to get the unique words and then subset the
> data frame to get the unique names and corresponding
> numbers but I am misunderstanding something.
>
> When I run the code below a) I get lots of warning and
> b) I get the correct results for 4 of the 5
> comparisons. However the comparison of  three with
> four (cc,dd) gives me an empty subset.
>
> Can anyone point out my error or suggest a better way
> to do this?
> Thanks
>
>
===========================================================================>
> mydata  = data.frame(aa = Cs(cat, dog, horse, cow),
> bb = c("mouse", "dog", "cow",
"pigeon"),
> cc  =c("emu", "rat", "crow",
"cow"),
> dd = c("cow", "camel", "manatee",
"parrot") ,
> ee = c( "coat", "hat", "dog",
"camel") ,
> ff = c("knife","dog", "cow",
"pigeon"),
> ann = c(1,2,3,4),
> bnn = c(5,6,7,8),
> cnn = c(9,10,11,12),
> dnn = c(13,14,15,16),
> enn = c(17,18,19,20),
> fnn = c(21,22,23,24))
>
> wordnames <- c("word", "number")
> word.list  <- rep(vector("list", 1), 5)
>
> for(j in 1:5) {
> lone.word <- setdiff(mydata[,j+1],mydata[,j]);
> lone.word
> matching <- subset(mydata[,c(j+1,j+7)],
> mydata[,j+1]==lone.word); matching
> word.list[[j]] <- matching; names(word.list[[j]])<-
> wordnames
> }
> word.list
>
>
============================================================================>
R version 2.6.0 (2007-10-03)
> i386-pc-mingw32
>
> locale:
>
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets
> methods   base
>
> other attached packages:
> [1] Hmisc_3.4-2 gdata_2.3.1
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> lattice_0.17-1
>
>
> R version 2.6.0 (2007-10-03)
> i386-pc-mingw32
>
> locale:
>
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets
> methods   base
>
> other attached packages:
> [1] Hmisc_3.4-2 gdata_2.3.1
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> lattice_0.17-1
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

John Kane

2007-Nov-02 14:54 UTC

head link

[R] subsetting problem with multiple criteria: Works in some but not all cases.

Thank you Jim.  That seems to work perfectly.  

I had looked at %in% and apparently misunderstood what
it would do.  Is there any place where I can read on
the various %XX% functions?

--- jim holtman <jholtman at gmail.com> wrote:
> I think the problem is with your use of "==" instead
> of "%in%"; try
> 
> matching <- subset(mydata[,c(j+1,j+7)], mydata[,j+1]
> %in% lone.word)
> 
> 
> 
> On 11/1/07, John Kane <jrkrideau at yahoo.ca> wrote:
> > I am trying to compare some word lists which have
> an
> > associate set of numbers. I want to compare word
> list
> > aa with bb and find only those words which are
> > unique to bb, then compare bb with cc, etc.
> >
> > I thought that I should be able to do this by
> using
> > setdiff to get the unique words and then subset
> the
> > data frame to get the unique names and
> corresponding
> > numbers but I am misunderstanding something.
> >
> > When I run the code below a) I get lots of warning
> and
> > b) I get the correct results for 4 of the 5
> > comparisons. However the comparison of  three with
> > four (cc,dd) gives me an empty subset.
> >
> > Can anyone point out my error or suggest a better
> way
> > to do this?
> > Thanks
> >
> >
>===========================================================================>
>> > mydata  = data.frame(aa = Cs(cat, dog, horse,
> cow),
> > bb = c("mouse", "dog", "cow",
"pigeon"),
> > cc  =c("emu", "rat", "crow",
"cow"),
> > dd = c("cow", "camel", "manatee",
"parrot") ,
> > ee = c( "coat", "hat", "dog",
"camel") ,
> > ff = c("knife","dog", "cow",
"pigeon"),
> > ann = c(1,2,3,4),
> > bnn = c(5,6,7,8),
> > cnn = c(9,10,11,12),
> > dnn = c(13,14,15,16),
> > enn = c(17,18,19,20),
> > fnn = c(21,22,23,24))
> >
> > wordnames <- c("word", "number")
> > word.list  <- rep(vector("list", 1), 5)
> >
> > for(j in 1:5) {
> > lone.word <- setdiff(mydata[,j+1],mydata[,j]);
> > lone.word
> > matching <- subset(mydata[,c(j+1,j+7)],
> > mydata[,j+1]==lone.word); matching
> > word.list[[j]] <- matching;
> names(word.list[[j]])<-
> > wordnames
> > }
> > word.list
> >
> >
>============================================================================>
> R version 2.6.0 (2007-10-03)> > i386-pc-mingw32
> >
> > locale:
> >
>
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils    
> datasets
> > methods   base
> >
> > other attached packages:
> > [1] Hmisc_3.4-2 gdata_2.3.1
> >
> > loaded via a namespace (and not attached):
> > [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> > lattice_0.17-1
> >
> >
> > R version 2.6.0 (2007-10-03)
> > i386-pc-mingw32
> >
> > locale:
> >
>
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils    
> datasets
> > methods   base
> >
> > other attached packages:
> > [1] Hmisc_3.4-2 gdata_2.3.1
> >
> > loaded via a namespace (and not attached):
> > [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> > lattice_0.17-1
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem you are trying to solve?
>

jim holtman

2007-Nov-02 21:05 UTC

head link

[R] subsetting problem with multiple criteria: Works in some but not all cases.

You can use apropos to find then and then list out their help pages:
> apropos("%")[1] "%%"   "%*%"  "%/%"  "%in%"
"%o%"  "%x%"


On 11/2/07, John Kane <jrkrideau at yahoo.ca>
wrote:> Thank you Jim.  That seems to work perfectly.
>
> I had looked at %in% and apparently misunderstood what
> it would do.  Is there any place where I can read on
> the various %XX% functions?
>
> --- jim holtman <jholtman at gmail.com> wrote:
>
> > I think the problem is with your use of "==" instead
> > of "%in%"; try
> >
> > matching <- subset(mydata[,c(j+1,j+7)], mydata[,j+1]
> > %in% lone.word)
> >
> >
> >
> > On 11/1/07, John Kane <jrkrideau at yahoo.ca> wrote:
> > > I am trying to compare some word lists which have
> > an
> > > associate set of numbers. I want to compare word
> > list
> > > aa with bb and find only those words which are
> > > unique to bb, then compare bb with cc, etc.
> > >
> > > I thought that I should be able to do this by
> > using
> > > setdiff to get the unique words and then subset
> > the
> > > data frame to get the unique names and
> > corresponding
> > > numbers but I am misunderstanding something.
> > >
> > > When I run the code below a) I get lots of warning
> > and
> > > b) I get the correct results for 4 of the 5
> > > comparisons. However the comparison of  three with
> > > four (cc,dd) gives me an empty subset.
> > >
> > > Can anyone point out my error or suggest a better
> > way
> > > to do this?
> > > Thanks
> > >
> > >
> >
>
===========================================================================>
> >
> > > mydata  = data.frame(aa = Cs(cat, dog, horse,
> > cow),
> > > bb = c("mouse", "dog", "cow",
"pigeon"),
> > > cc  =c("emu", "rat", "crow",
"cow"),
> > > dd = c("cow", "camel", "manatee",
"parrot") ,
> > > ee = c( "coat", "hat", "dog",
"camel") ,
> > > ff = c("knife","dog", "cow",
"pigeon"),
> > > ann = c(1,2,3,4),
> > > bnn = c(5,6,7,8),
> > > cnn = c(9,10,11,12),
> > > dnn = c(13,14,15,16),
> > > enn = c(17,18,19,20),
> > > fnn = c(21,22,23,24))
> > >
> > > wordnames <- c("word", "number")
> > > word.list  <- rep(vector("list", 1), 5)
> > >
> > > for(j in 1:5) {
> > > lone.word <- setdiff(mydata[,j+1],mydata[,j]);
> > > lone.word
> > > matching <- subset(mydata[,c(j+1,j+7)],
> > > mydata[,j+1]==lone.word); matching
> > > word.list[[j]] <- matching;
> > names(word.list[[j]])<-
> > > wordnames
> > > }
> > > word.list
> > >
> > >
> >
>
============================================================================>
> > R version 2.6.0 (2007-10-03)
> > > i386-pc-mingw32
> > >
> > > locale:
> > >
> >
>
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252
> > >
> > > attached base packages:
> > > [1] stats     graphics  grDevices utils
> > datasets
> > > methods   base
> > >
> > > other attached packages:
> > > [1] Hmisc_3.4-2 gdata_2.3.1
> > >
> > > loaded via a namespace (and not attached):
> > > [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> > > lattice_0.17-1
> > >
> > >
> > > R version 2.6.0 (2007-10-03)
> > > i386-pc-mingw32
> > >
> > > locale:
> > >
> >
>
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252
> > >
> > > attached base packages:
> > > [1] stats     graphics  grDevices utils
> > datasets
> > > methods   base
> > >
> > > other attached packages:
> > > [1] Hmisc_3.4-2 gdata_2.3.1
> > >
> > > loaded via a namespace (and not attached):
> > > [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> > > lattice_0.17-1
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained,
> > reproducible code.
> > >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem you are trying to solve?
> >
>
>
>
>      Ask a question on any topic and get answers from real people. Go to
Yahoo! Answers and share what you know at http://ca.answers.yahoo.com
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

John Kane

2007-Nov-05 16:00 UTC

head link

[R] subsetting problem with multiple criteria: Works in some but not all cases.

Excellent, thank you .
--- jim holtman <jholtman at gmail.com> wrote:
> You can use apropos to find then and then list out
> their help pages:
> 
> > apropos("%")
> [1] "%%"   "%*%"  "%/%"  "%in%"
"%o%"  "%x%"
> 
> 
> On 11/2/07, John Kane <jrkrideau at yahoo.ca> wrote:
> > Thank you Jim.  That seems to work perfectly.
> >
> > I had looked at %in% and apparently misunderstood
> what
> > it would do.  Is there any place where I can read
> on
> > the various %XX% functions?
> >
> > --- jim holtman <jholtman at gmail.com> wrote:
> >
> > > I think the problem is with your use of "=="
> instead
> > > of "%in%"; try
> > >
> > > matching <- subset(mydata[,c(j+1,j+7)],
> mydata[,j+1]
> > > %in% lone.word)
> > >
> > >
> > >
> > > On 11/1/07, John Kane <jrkrideau at yahoo.ca>
> wrote:
> > > > I am trying to compare some word lists which
> have
> > > an
> > > > associate set of numbers. I want to compare
> word
> > > list
> > > > aa with bb and find only those words which are
> > > > unique to bb, then compare bb with cc, etc.
> > > >
> > > > I thought that I should be able to do this by
> > > using
> > > > setdiff to get the unique words and then
> subset
> > > the
> > > > data frame to get the unique names and
> > > corresponding
> > > > numbers but I am misunderstanding something.
> > > >
> > > > When I run the code below a) I get lots of
> warning
> > > and
> > > > b) I get the correct results for 4 of the 5
> > > > comparisons. However the comparison of  three
> with
> > > > four (cc,dd) gives me an empty subset.
> > > >
> > > > Can anyone point out my error or suggest a
> better
> > > way
> > > > to do this?
> > > > Thanks
> > > >
> > > >
> > >
> >
>===========================================================================>
> > >> > > > mydata  = data.frame(aa = Cs(cat, dog, horse,
> > > cow),
> > > > bb = c("mouse", "dog", "cow",
"pigeon"),
> > > > cc  =c("emu", "rat", "crow",
"cow"),
> > > > dd = c("cow", "camel",
"manatee", "parrot") ,
> > > > ee = c( "coat", "hat", "dog",
"camel") ,
> > > > ff = c("knife","dog", "cow",
"pigeon"),
> > > > ann = c(1,2,3,4),
> > > > bnn = c(5,6,7,8),
> > > > cnn = c(9,10,11,12),
> > > > dnn = c(13,14,15,16),
> > > > enn = c(17,18,19,20),
> > > > fnn = c(21,22,23,24))
> > > >
> > > > wordnames <- c("word", "number")
> > > > word.list  <- rep(vector("list", 1), 5)
> > > >
> > > > for(j in 1:5) {
> > > > lone.word <- setdiff(mydata[,j+1],mydata[,j]);
> > > > lone.word
> > > > matching <- subset(mydata[,c(j+1,j+7)],
> > > > mydata[,j+1]==lone.word); matching
> > > > word.list[[j]] <- matching;
> > > names(word.list[[j]])<-
> > > > wordnames
> > > > }
> > > > word.list
> > > >
> > > >
> > >
> >
>============================================================================>
> > > R version 2.6.0 (2007-10-03)> > > > i386-pc-mingw32
> > > >
> > > > locale:
> > > >
> > >
> >
>
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252> > > >
> > > > attached base packages:
> > > > [1] stats     graphics  grDevices utils
> > > datasets
> > > > methods   base
> > > >
> > > > other attached packages:
> > > > [1] Hmisc_3.4-2 gdata_2.3.1
> > > >
> > > > loaded via a namespace (and not attached):
> > > > [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> > > > lattice_0.17-1
> > > >
> > > >
> > > > R version 2.6.0 (2007-10-03)
> > > > i386-pc-mingw32
> > > >
> > > > locale:
> > > >
> > >
> >
>
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252> > > >
> > > > attached base packages:
> > > > [1] stats     graphics  grDevices utils
> > > datasets
> > > > methods   base
> > > >
> > > > other attached packages:
> > > > [1] Hmisc_3.4-2 gdata_2.3.1
> > > >
> > > > loaded via a namespace (and not attached):
> > > > [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> > > > lattice_0.17-1
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal,
> self-contained,
> > > reproducible code.
> > > >
> > >
> > >
> > > --
> > > Jim Holtman
> > > Cincinnati, OH
> > > +1 513 646 9390
> > >
> > > What is the problem you are trying to solve?
> > >
> >
> >
> >
> >      Ask a question on any topic and get answers
> >
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem you are trying to solve?
>

R help - Nov 2007 - subsetting problem with multiple criteria: Works in some but not all cases.

[R] subsetting problem with multiple criteria: Works in some but not all cases.

[R] subsetting problem with multiple criteria: Works in some but not all cases.

[R] subsetting problem with multiple criteria: Works in some but not all cases.

[R] subsetting problem with multiple criteria: Works in some but not all cases.

[R] subsetting problem with multiple criteria: Works in some but not all cases.