thr3ads.net - R help - [R] R : how does %in% operator work? [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Moumita Das

2009-Aug-17 13:57 UTC

[R] R : how does %in% operator work?

*Problem-1*



CASE-I---------(works fine)
> var1<-"tom"
> var1
[1"tom"
>  var1<-as.character(var1)
>  var1
[1] "tom"
>  var2<-c("tom","harry","kate")
> logc<-(var1 %in% var2)
> logc
[1] TRUE
> typeof(var1)
[1] "character"
> typeof(var2)
[1] "character"



*CASE-II---------(doesn’t  work)*

I have my dynamically generated dataset on which I want to use this %in%
operator.But it’s not working



*predictors_values data frame is shown below:---------------*

       x

2  recmeanC2

3  recmeanC3

4  recmeanC4

5         i1

6         i2

7         i3

8         i4

9         i5

10        i6

11        i7

12        i8

13        i9

14       i10

15       i11

16       i12

17       i13

18       i14

19       i15

*coef_dataframe_rownames data frame is shown below:----*

if (stringsAsFactors) factor(x) else x

1                               recmeanC2

2                               recmeanC3

3                               recmeanC4

4                                      i1

5                                      i2

6                                      i3

7                                      i4

8                                      i5

9                                      i6

10                                     i7

11                                     i8

12                                     i9

13                                    i10

14                                    i12

15                                    i13



*Just pasted a part of my code:--*

predictor<-predictors_values[1,1]

predictor<-as.character(predictor)

predictor<-noquote(predictor)

print("predictor")

print(predictor) ##prints recmeanC1




print("coef_dataframe_rownames")

#coef_dataframe_rownames<-c(coef_dataframe_rownames)

#coef_dataframe_rownames<-c("recmeanC2","recmeanC3","
recmeanC4","i1")
   *#only
when I har –coded in this way I get correct values for logc(you will find
logc below)*

names(coef_dataframe_rownames)<-letters[1]

coef_dataframe_rownames<-c(coef_dataframe_rownames)

print(coef_dataframe_rownames)



#prints

[1] "coef_dataframe_rownames"

$a

 [1] recmeanC2 recmeanC3 recmeanC4 i1        i2        i3        i4

 [8] i5        i6        i7        i8        i9        i10       i12

[15] i13

print(typeof(predictor))

print(typeof(coef_dataframe_rownames))

logc<-(predictor %in% coef_dataframe_rownames)

print("logc")

print(logc) # prints FALSE

For  logc<-(predictor %in% coef_dataframe_rownames) to work I have changed
the predictor and coef_dataframe_rownames to all different data types ,like
both vectors ,both dats frames, predictor to character and
coef_dataframe_rownames to vector…But nothings seems to work.

[ If predictor  is in coef_dataframe_rownames  do  task 1 else task2 ]

Here predictors_values is a data frame of all possible predictors when one
particular element ‘s regression is to be done.And coef_dataframe_rownames  is
the  data frame of rownames of the coefficients table which was produced as
a result of regression function.

*Problem-2:--*

I wanted something ,as in Problem -1 because of  Problem-2.

Now if some rows of the coefficients  table are filled with NAs in all row
then those rows are getting omitted automatically when I am trying to access
only the coefficients table like this:--



*
fit<-lm(item_category_table[element_n_predictors_string_to_vector],singular.ok=TRUE)
*

*Coefficients<-summary(fit)$coefficients*

Now becausing I am running loops to enter values of “coefficients table “ in
the database tables ,the omission of the rows with all NAs are causing
problems. Even if these rows do not have values I need to populate the data
base tables values for these particular NA row s of the coefficients table.

*Is there any way to get the full coefficients table with out the NA
containing rows being omitted?*



Print  gives this:----

[1] "coef_dataframe without intercept"  # I have omitted the intercept
,please don not get confused

                Estimate   Std. Error       t value   Pr(>|t|)

recmeanC2          9.275880e-17 6.322780e-17  1.467057e+00 0.14349903

recmeanC3         1.283534e-17 2.080644e-17  6.168929e-01 0.53781390

recmeanC4         -3.079466e-17 2.565499e-17 -1.200338e+00 0.23103743

i1                             5.000000e-01 1.036197e-17  4.825338e+16
0.00000000

i2                               -5.630739e-18 1.638267e-17 -3.437010e-01
0.73133282

i3                              4.291387e-18 1.207522e-17  3.553879e-01
0.72257050

i4                              1.472662e-17 1.423051e-17  1.034863e+00
0.30163897

i5                               5.000000e-01 1.003323e-17  4.983441e+16
0.00000000

i6                              5.147966e-18 1.569095e-17  3.280850e-01
0.74309614

i7                              1.096044e-17 1.555829e-17  7.044760e-01
0.48173041

i8        -1.166290e-18 1.287370e-17 -9.059482e-02 0.92788026

i9         1.627371e-17 1.540567e-17  1.056345e+00 0.29173427

i10        4.001692e-18 1.365740e-17  2.930053e-01 0.76973827

i12       -1.052843e-17 1.324484e-17 -7.949081e-01 0.42735000

i13        2.571236e-17 1.357336e-17  1.894325e+00 0.05922715


Whereas summary(fit ) gives:-------------

Coefficients: (3 not defined because of singularities)

              Estimate Std. Error    t value Pr(>|t|)

(Intercept)  2.808e-16  1.579e-17  1.778e+01   <2e-16 ***

recmeanC2    9.276e-17  6.323e-17  1.467e+00   0.1435

recmeanC3    1.283e-17  2.081e-17  6.170e-01   0.5378

recmeanC4   -3.080e-17  2.566e-17 -1.200e+00   0.2310

i1           5.000e-01  1.036e-17  4.825e+16   <2e-16 ***

i2          -5.631e-18  1.638e-17 -3.440e-01   0.7313

i3           4.291e-18  1.207e-17  3.550e-01   0.7226

i4           1.473e-17  1.423e-17  1.035e+00   0.3016

i5           5.000e-01  1.003e-17  4.983e+16   <2e-16 ***

i6           5.148e-18  1.569e-17  3.280e-01   0.7431

i7           1.096e-17  1.556e-17  7.040e-01   0.4817

i8          -1.166e-18  1.287e-17 -9.100e-02   0.9279

i9           1.627e-17  1.541e-17  1.056e+00   0.2917

i10          4.002e-18  1.366e-17  2.930e-01   0.7697

i11                 NA         NA         NA       NA

i12         -1.053e-17  1.325e-17 -7.950e-01   0.4273

i13          2.571e-17  1.357e-17  1.894e+00   0.0592 .

i14                 NA         NA         NA       NA

i15                 NA         NA         NA       NA






I know THERE ARE OTHER COMPARISONS OPERATOR S  like
all.equal,identical,compare,setdiff.I do not have compare function,all.equal
doesn’t solve my problem,it just comapares and gives the diff,setdiff also
didn’t work and also identical didn’t. I know there’s problem with data in
the dataset coef_dataframe_rownames.Because

coef_dataframe_rownames<-c("recmeanC2","recmeanC3","
recmeanC4","i1")    *#only
when I har –coded in this way I get correct values for logc*

How should treat my dataset to get correct values?






-- 
Thanks
Moumita

	[[alternative HTML version deleted]]

Kenn Konstabel

2009-Aug-18 12:55 UTC

head link

[R] R : how does %in% operator work?

It would be helpful to give a MUCH shorter example. The problem you have
doesn't seem to be too complicated -- you don't need to explain all
possible
details, just the ones that you think might cause the problem. (Saying "it
doesn't work" isn't helpful -- please be more specific and tell us
what you
expect and what you got. Also, a lot of your code is probably irrelevant to
the problem.)

Now after a cursory reading I think you're comparing a vector (see ?vector)
to a data frame. You can do this if you know what you're doing but currently
the result doesn't seem to be what you expect.

a <-1
b <- data.frame(boo=1)
a%in%b
# TRUE

a <- 1
b <- data.frame(boo=1:2)
a%in%b
# FALSE

match and %in% first convert their arguments to character (see ?match or
?"%in%" !!!!), so your typeof checks are irrelevant. See what happens
if you
convert a data frame to character:

as.character( data.frame(a=c(1,2,3), b=c(3,5,7)))
# [1] "c(1, 2, 3)" "c(3, 5, 7)"
# (I wouldn't have expected exactly this but maybe it makes sense)
# (at least, it makes sense in the context of match and %in%)

So *maybe* the solution to your problem is to make sure that *both*
arguments that you give to %in% are vectors, not data frames, not anything
else (use $ or [[ with data frames):

a %in% b$boo
#TRUE
# "1" is not %in% "1:2" but it is %in% "1" (which
makes sense)

If not, try to make your question and examples shorter and clearer.

Regards,
KK

On Mon, Aug 17, 2009 at 4:57 PM, Moumita Das
<das.moumita.online@gmail.com>wrote:
> *Problem-1*
>
>
>
> CASE-I---------(works fine)
>
> > var1<-"tom"
>
> > var1
>
> [1"tom"
>
> >  var1<-as.character(var1)
>
> >  var1
>
> [1] "tom"
>
> >  var2<-c("tom","harry","kate")
>
> > logc<-(var1 %in% var2)
>
> > logc
>
> [1] TRUE
>
> > typeof(var1)
>
> [1] "character"
>
> > typeof(var2)
>
> [1] "character"
>
>
>
> *CASE-II---------(doesn’t  work)*
>
> I have my dynamically generated dataset on which I want to use this %in%
> operator.But it’s not working
>
>
>
> *predictors_values data frame is shown below:---------------*
>
>       x
>
> 2  recmeanC2
>
> 3  recmeanC3
>
> 4  recmeanC4
>
> 5         i1
>
> 6         i2
>
> 7         i3
>
> 8         i4
>
> 9         i5
>
> 10        i6
>
> 11        i7
>
> 12        i8
>
> 13        i9
>
> 14       i10
>
> 15       i11
>
> 16       i12
>
> 17       i13
>
> 18       i14
>
> 19       i15
>
> *coef_dataframe_rownames data frame is shown below:----*
>
> if (stringsAsFactors) factor(x) else x
>
> 1                               recmeanC2
>
> 2                               recmeanC3
>
> 3                               recmeanC4
>
> 4                                      i1
>
> 5                                      i2
>
> 6                                      i3
>
> 7                                      i4
>
> 8                                      i5
>
> 9                                      i6
>
> 10                                     i7
>
> 11                                     i8
>
> 12                                     i9
>
> 13                                    i10
>
> 14                                    i12
>
> 15                                    i13
>
>
>
> *Just pasted a part of my code:--*
>
> predictor<-predictors_values[1,1]
>
> predictor<-as.character(predictor)
>
> predictor<-noquote(predictor)
>
> print("predictor")
>
> print(predictor) ##prints recmeanC1
>
>
>
>
> print("coef_dataframe_rownames")
>
> #coef_dataframe_rownames<-c(coef_dataframe_rownames)
>
>
#coef_dataframe_rownames<-c("recmeanC2","recmeanC3","
recmeanC4","i1")
>   *#only
> when I har –coded in this way I get correct values for logc(you will find
> logc below)*
>
> names(coef_dataframe_rownames)<-letters[1]
>
> coef_dataframe_rownames<-c(coef_dataframe_rownames)
>
> print(coef_dataframe_rownames)
>
>
>
> #prints
>
> [1] "coef_dataframe_rownames"
>
> $a
>
>  [1] recmeanC2 recmeanC3 recmeanC4 i1        i2        i3        i4
>
>  [8] i5        i6        i7        i8        i9        i10       i12
>
> [15] i13
>
> print(typeof(predictor))
>
> print(typeof(coef_dataframe_rownames))
>
> logc<-(predictor %in% coef_dataframe_rownames)
>
> print("logc")
>
> print(logc) # prints FALSE
>
> For  logc<-(predictor %in% coef_dataframe_rownames) to work I have
changed
> the predictor and coef_dataframe_rownames to all different data types ,like
> both vectors ,both dats frames, predictor to character and
> coef_dataframe_rownames to vector…But nothings seems to work.
>
> [ If predictor  is in coef_dataframe_rownames  do  task 1 else task2 ]
>
> Here predictors_values is a data frame of all possible predictors when one
> particular element ‘s regression is to be done.And coef_dataframe_rownames
>  is
> the  data frame of rownames of the coefficients table which was produced as
> a result of regression function.
>
> *Problem-2:--*
>
> I wanted something ,as in Problem -1 because of  Problem-2.
>
> Now if some rows of the coefficients  table are filled with NAs in all row
> then those rows are getting omitted automatically when I am trying to
> access
> only the coefficients table like this:--
>
>
>
> *
>
>
fit<-lm(item_category_table[element_n_predictors_string_to_vector],singular.ok=TRUE)
> *
>
> *Coefficients<-summary(fit)$coefficients*
>
> Now becausing I am running loops to enter values of “coefficients table “
> in
> the database tables ,the omission of the rows with all NAs are causing
> problems. Even if these rows do not have values I need to populate the data
> base tables values for these particular NA row s of the coefficients table.
>
> *Is there any way to get the full coefficients table with out the NA
> containing rows being omitted?*
>
>
>
> Print  gives this:----
>
> [1] "coef_dataframe without intercept"  # I have omitted the
intercept
> ,please don not get confused
>
>                Estimate   Std. Error       t value   Pr(>|t|)
>
> recmeanC2          9.275880e-17 6.322780e-17  1.467057e+00 0.14349903
>
> recmeanC3         1.283534e-17 2.080644e-17  6.168929e-01 0.53781390
>
> recmeanC4         -3.079466e-17 2.565499e-17 -1.200338e+00 0.23103743
>
> i1                             5.000000e-01 1.036197e-17  4.825338e+16
> 0.00000000
>
> i2                               -5.630739e-18 1.638267e-17 -3.437010e-01
> 0.73133282
>
> i3                              4.291387e-18 1.207522e-17  3.553879e-01
> 0.72257050
>
> i4                              1.472662e-17 1.423051e-17  1.034863e+00
> 0.30163897
>
> i5                               5.000000e-01 1.003323e-17  4.983441e+16
> 0.00000000
>
> i6                              5.147966e-18 1.569095e-17  3.280850e-01
> 0.74309614
>
> i7                              1.096044e-17 1.555829e-17  7.044760e-01
> 0.48173041
>
> i8        -1.166290e-18 1.287370e-17 -9.059482e-02 0.92788026
>
> i9         1.627371e-17 1.540567e-17  1.056345e+00 0.29173427
>
> i10        4.001692e-18 1.365740e-17  2.930053e-01 0.76973827
>
> i12       -1.052843e-17 1.324484e-17 -7.949081e-01 0.42735000
>
> i13        2.571236e-17 1.357336e-17  1.894325e+00 0.05922715
>
>
> Whereas summary(fit ) gives:-------------
>
> Coefficients: (3 not defined because of singularities)
>
>              Estimate Std. Error    t value Pr(>|t|)
>
> (Intercept)  2.808e-16  1.579e-17  1.778e+01   <2e-16 ***
>
> recmeanC2    9.276e-17  6.323e-17  1.467e+00   0.1435
>
> recmeanC3    1.283e-17  2.081e-17  6.170e-01   0.5378
>
> recmeanC4   -3.080e-17  2.566e-17 -1.200e+00   0.2310
>
> i1           5.000e-01  1.036e-17  4.825e+16   <2e-16 ***
>
> i2          -5.631e-18  1.638e-17 -3.440e-01   0.7313
>
> i3           4.291e-18  1.207e-17  3.550e-01   0.7226
>
> i4           1.473e-17  1.423e-17  1.035e+00   0.3016
>
> i5           5.000e-01  1.003e-17  4.983e+16   <2e-16 ***
>
> i6           5.148e-18  1.569e-17  3.280e-01   0.7431
>
> i7           1.096e-17  1.556e-17  7.040e-01   0.4817
>
> i8          -1.166e-18  1.287e-17 -9.100e-02   0.9279
>
> i9           1.627e-17  1.541e-17  1.056e+00   0.2917
>
> i10          4.002e-18  1.366e-17  2.930e-01   0.7697
>
> i11                 NA         NA         NA       NA
>
> i12         -1.053e-17  1.325e-17 -7.950e-01   0.4273
>
> i13          2.571e-17  1.357e-17  1.894e+00   0.0592 .
>
> i14                 NA         NA         NA       NA
>
> i15                 NA         NA         NA       NA
>
>
>
>
>
>
> I know THERE ARE OTHER COMPARISONS OPERATOR S  like
> all.equal,identical,compare,setdiff.I do not have compare
> function,all.equal
> doesn’t solve my problem,it just comapares and gives the diff,setdiff also
> didn’t work and also identical didn’t. I know there’s problem with data in
> the dataset coef_dataframe_rownames.Because
>
>
coef_dataframe_rownames<-c("recmeanC2","recmeanC3","
recmeanC4","i1")
>  *#only
> when I har –coded in this way I get correct values for logc*
>
> How should treat my dataset to get correct values?
>
>
>
>
>
>
> --
> Thanks
> Moumita
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more reasonably related threads

R help - Aug 2009 - R : how does %in% operator work?

[R] R : how does %in% operator work?

[R] R : how does %in% operator work?

Reasonably Related Threads