thr3ads.net - R help - [R] Q re: logical indexing with is.na [Mar 2019]

If this information is useful, please help other people find it:
Share via:

David Goldsmith

2019-Mar-10 01:36 UTC

[R] Q re: logical indexing with is.na

Hi!  Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/
R"; not
new to statistics (have had grad-level courses and work experience in
statistics) or vectorized programming syntax (have extensive experience
with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time ago--of
experience w/ S-plus).

In exploring the use of is.na in the context of logical indexing, I've come
across the following puzzling-to-me result:
> y; !is.na(y[1:3]); y[!is.na(y[1:3])][1]  0.3534253 -1.6731597         NA -0.2079209
[1]  TRUE  TRUE FALSE
[1]  0.3534253 -1.6731597 -0.2079209

As you can see, y is a four element vector, the third element of which is
NA; the next line gives what I would expect--T T F--because the first two
elements are not NA but the third element is.  The third line is what
confuses me: why is the result not the two element vector consisting of
simply the first two elements of the vector (or, if vectorized indexing in
R is implemented to return a vector the same length as the logical index
vector, which appears to be the case, at least the first two elements and
then either NA or NaN in the third slot, where the logical indexing vector
is FALSE): why does the implementation "go looking" for an element
whose
index in the "original" vector, 4, is larger than BOTH the largest
index
specified in the inner-most subsetting index AND the size of the resulting
indexing vector?  (Note: at first I didn't even understand why the result
wasn't simply

0.3534253 -1.6731597         NA

but then I realized that the third logical index being FALSE, there was no
reason for *any* element to be there; but if there is, due to some
overriding rule regarding the length of the result relative to the length
of the indexer, shouldn't it revert back to *something* that indicates the
"FALSE"ness of that indexing element?)

Thanks!

DLG
> sessionInfo()R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ISwR_2.0-7

loaded via a namespace (and not attached):
[1] compiler_3.5.2 tools_3.5.2

	[[alternative HTML version deleted]]

Richard M. Heiberger

2019-Mar-10 02:30 UTC

head link

[R] Q re: logical indexing with is.na

>From ?Arithmeticthe elements of shorter
     vectors are recycled as necessary (with a ?warning? when they are
     recycled only _fractionally_).
> tmp <- !is.na(y[1:3])
> tmp
[1]  TRUE  TRUE FALSE> c(tmp, tmp)
[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE> c(tmp, tmp)[1:4]
[1]  TRUE  TRUE FALSE  TRUE>  y[c(tmp, tmp)[1:4]]
[1]  0.3534253 -1.6731597 -0.2079209>
The behavior is as documented.  I am surprised that there is no
warning about partial recycling.

On Sat, Mar 9, 2019 at 9:03 PM David Goldsmith
<eulergaussriemann at gmail.com> wrote:>
> Hi!  Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/
R"; not
> new to statistics (have had grad-level courses and work experience in
> statistics) or vectorized programming syntax (have extensive experience
> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time ago--of
> experience w/ S-plus).
>
> In exploring the use of is.na in the context of logical indexing, I've
come
> across the following puzzling-to-me result:
>
> > y; !is.na(y[1:3]); y[!is.na(y[1:3])]
> [1]  0.3534253 -1.6731597         NA -0.2079209
> [1]  TRUE  TRUE FALSE
> [1]  0.3534253 -1.6731597 -0.2079209
>
> As you can see, y is a four element vector, the third element of which is
> NA; the next line gives what I would expect--T T F--because the first two
> elements are not NA but the third element is.  The third line is what
> confuses me: why is the result not the two element vector consisting of
> simply the first two elements of the vector (or, if vectorized indexing in
> R is implemented to return a vector the same length as the logical index
> vector, which appears to be the case, at least the first two elements and
> then either NA or NaN in the third slot, where the logical indexing vector
> is FALSE): why does the implementation "go looking" for an
element whose
> index in the "original" vector, 4, is larger than BOTH the
largest index
> specified in the inner-most subsetting index AND the size of the resulting
> indexing vector?  (Note: at first I didn't even understand why the
result
> wasn't simply
>
> 0.3534253 -1.6731597         NA
>
> but then I realized that the third logical index being FALSE, there was no
> reason for *any* element to be there; but if there is, due to some
> overriding rule regarding the length of the result relative to the length
> of the indexer, shouldn't it revert back to *something* that indicates
the
> "FALSE"ness of that indexing element?)
>
> Thanks!
>
> DLG
>
> > sessionInfo()
> R version 3.5.2 (2018-12-20)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS High Sierra 10.13.6
>
> Matrix products: default
> BLAS:
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] ISwR_2.0-7
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.2 tools_3.5.2
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rolf Turner

2019-Mar-10 02:57 UTC

head link

[R] [FORGED] Q re: logical indexing with is.na

On 3/10/19 2:36 PM, David Goldsmith wrote:> Hi!  Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/
R"; not
> new to statistics (have had grad-level courses and work experience in
> statistics) or vectorized programming syntax (have extensive experience
> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time ago--of
> experience w/ S-plus).
> 
> In exploring the use of is.na in the context of logical indexing, I've
come
> across the following puzzling-to-me result:
> 
>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
> [1]  0.3534253 -1.6731597         NA -0.2079209
> [1]  TRUE  TRUE FALSE
> [1]  0.3534253 -1.6731597 -0.2079209
> 
> As you can see, y is a four element vector, the third element of which is
> NA; the next line gives what I would expect--T T F--because the first two
> elements are not NA but the third element is.  The third line is what
> confuses me: why is the result not the two element vector consisting of
> simply the first two elements of the vector (or, if vectorized indexing in
> R is implemented to return a vector the same length as the logical index
> vector, which appears to be the case, at least the first two elements and
> then either NA or NaN in the third slot, where the logical indexing vector
> is FALSE): why does the implementation "go looking" for an
element whose
> index in the "original" vector, 4, is larger than BOTH the
largest index
> specified in the inner-most subsetting index AND the size of the resulting
> indexing vector?  (Note: at first I didn't even understand why the
result
> wasn't simply
> 
> 0.3534253 -1.6731597         NA
> 
> but then I realized that the third logical index being FALSE, there was no
> reason for *any* element to be there; but if there is, due to some
> overriding rule regarding the length of the result relative to the length
> of the indexer, shouldn't it revert back to *something* that indicates
the
> "FALSE"ness of that indexing element?)
> 
> Thanks!
It happens because R is eco-concious and re-cycles. :-)

Try:

ok <- c(TRUE,TRUE,FALSE)
(1:4)[ok]

In general in R if there is an operation involving two vectors then
the shorter one gets recycled to provide sufficiently many entries to 
match those of the longer vector.

This in the foregoing example the first entry of "ok" gets used again,
to make a length 4 vector to match up with 1:4.  The result is the same 
as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].

If you did (1:7)[ok] you'd get the same result as that from
(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
recycled 2 and 1/3 times.

Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .

Note that in the first two instances you get warnings, but in the third
you don't, since 6 is an integer multiple of 3.

Why aren't there warnings when logical indexing is used?  I guess 
because it would be annoying.  Maybe.

Note that integer indices get recycled too, but the recycling is limited 
so as not to produce redundancies.  So

(1:4)[1:3] just (sensibly) gives

[1] 1 2 3

and *not*

[1] 1 2 3 1

Perhaps a bit subtle, but it gives what you'd actually *want* rather 
than being pedantic about rules with a result that you wouldn't want.

cheers,

Rolf Turner

P.S.  If you do

y[1:3][!is.na(y[1:3])]

i.e. if you're careful to match the length of the vector and the that of 
the indices, you get what you initially expected.

R. T.

P^2.S.  To the younger and wiser heads on this list:  the help on "[" 
does not mention that the index vectors can be logical.  I couldn't find 
anything about logical indexing in the R help files.  Is something 
missing here, or am I just not looking in the right place?

R. T.

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

Jeff Newmiller

2019-Mar-10 05:07 UTC

head link

[R] [FORGED] Q re: logical indexing with is.na

Regarding the mention of logical indexing, under ?Extract I see:

For?[-indexing only:?i,?j,?...?can be logical vectors, indicating
elements/slices to select. Such vectors are recycled if necessary to match the
corresponding extent.?i,?j,?...?can also be negative integers, indicating
elements/slices to leave out of the selection.

On March 9, 2019 6:57:05 PM PST, Rolf Turner <r.turner at auckland.ac.nz>
wrote:>On 3/10/19 2:36 PM, David Goldsmith wrote:
>> Hi!  Newbie (self-)learning R using P. Dalgaard's "Intro Stats
w/ R";
>not
>> new to statistics (have had grad-level courses and work experience in
>> statistics) or vectorized programming syntax (have extensive
>experience
>> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time
>ago--of
>> experience w/ S-plus).
>> 
>> In exploring the use of is.na in the context of logical indexing,
>I've come
>> across the following puzzling-to-me result:
>> 
>>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
>> [1]  0.3534253 -1.6731597         NA -0.2079209
>> [1]  TRUE  TRUE FALSE
>> [1]  0.3534253 -1.6731597 -0.2079209
>> 
>> As you can see, y is a four element vector, the third element of
>which is
>> NA; the next line gives what I would expect--T T F--because the first
>two
>> elements are not NA but the third element is.  The third line is what
>> confuses me: why is the result not the two element vector consisting
>of
>> simply the first two elements of the vector (or, if vectorized
>indexing in
>> R is implemented to return a vector the same length as the logical
>index
>> vector, which appears to be the case, at least the first two elements
>and
>> then either NA or NaN in the third slot, where the logical indexing
>vector
>> is FALSE): why does the implementation "go looking" for an
element
>whose
>> index in the "original" vector, 4, is larger than BOTH the
largest
>index
>> specified in the inner-most subsetting index AND the size of the
>resulting
>> indexing vector?  (Note: at first I didn't even understand why the
>result
>> wasn't simply
>> 
>> 0.3534253 -1.6731597         NA
>> 
>> but then I realized that the third logical index being FALSE, there
>was no
>> reason for *any* element to be there; but if there is, due to some
>> overriding rule regarding the length of the result relative to the
>length
>> of the indexer, shouldn't it revert back to *something* that
>indicates the
>> "FALSE"ness of that indexing element?)
>> 
>> Thanks!
>
>It happens because R is eco-concious and re-cycles. :-)
>
>Try:
>
>ok <- c(TRUE,TRUE,FALSE)
>(1:4)[ok]
>
>In general in R if there is an operation involving two vectors then
>the shorter one gets recycled to provide sufficiently many entries to 
>match those of the longer vector.
>
>This in the foregoing example the first entry of "ok" gets used
again,
>to make a length 4 vector to match up with 1:4.  The result is the same
>
>as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].
>
>If you did (1:7)[ok] you'd get the same result as that from
>(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
>recycled 2 and 1/3 times.
>
>Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .
>
>Note that in the first two instances you get warnings, but in the third
>you don't, since 6 is an integer multiple of 3.
>
>Why aren't there warnings when logical indexing is used?  I guess 
>because it would be annoying.  Maybe.
>
>Note that integer indices get recycled too, but the recycling is
>limited 
>so as not to produce redundancies.  So
>
>(1:4)[1:3] just (sensibly) gives
>
>[1] 1 2 3
>
>and *not*
>
>[1] 1 2 3 1
>
>Perhaps a bit subtle, but it gives what you'd actually *want* rather 
>than being pedantic about rules with a result that you wouldn't want.
>
>cheers,
>
>Rolf Turner
>
>P.S.  If you do
>
>y[1:3][!is.na(y[1:3])]
>
>i.e. if you're careful to match the length of the vector and the that
>of 
>the indices, you get what you initially expected.
>
>R. T.
>
>P^2.S.  To the younger and wiser heads on this list:  the help on
"["
>does not mention that the index vectors can be logical.  I couldn't
>find 
>anything about logical indexing in the R help files.  Is something 
>missing here, or am I just not looking in the right place?
>
>R. T.
-- 
Sent from my phone. Please excuse my brevity.

Izmirlian, Grant (NIH/NCI) [E]

2019-Mar-11 17:11 UTC

head link

[R] Q re: logical indexing with is.na

logical indexing requires the logical index to be of the same length as the
vector being indexed. If it is not, then the index
is wrapped to be of sufficient length. The result on line 3 is
y[c(TRUE, TRUE, FALSE, TRUE)] where the last TRUE was
originally the first component of !is.na(y[1:3])


Grant Izmirlian, Ph.D.
Mathematical Statistician
izmirlig at mail.nih.gov

Delivery Address:
9609 Medical Center Dr, RM 5E130
Rockville MD 20850

Postal Address:
BG 9609 RM 5E130 MSC 9789
9609 Medical Center Dr
Bethesda, MD 20892-9789

 ofc:  240-276-7025
 cell: 240-888-7367
  fax: 240-276-7845


________________________________
From: David Goldsmith <eulergaussriemann at gmail.com>
Sent: Saturday, March 9, 2019 8:36 PM
To: r-help at r-project.org
Subject: [R] Q re: logical indexing with is.na

Hi!  Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/
R"; not
new to statistics (have had grad-level courses and work experience in
statistics) or vectorized programming syntax (have extensive experience
with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time ago--of
experience w/ S-plus).

In exploring the use of is.na in the context of logical indexing, I've come
across the following puzzling-to-me result:
> y; !is.na(y[1:3]); y[!is.na(y[1:3])][1]  0.3534253 -1.6731597         NA -0.2079209
[1]  TRUE  TRUE FALSE
[1]  0.3534253 -1.6731597 -0.2079209

As you can see, y is a four element vector, the third element of which is
NA; the next line gives what I would expect--T T F--because the first two
elements are not NA but the third element is.  The third line is what
confuses me: why is the result not the two element vector consisting of
simply the first two elements of the vector (or, if vectorized indexing in
R is implemented to return a vector the same length as the logical index
vector, which appears to be the case, at least the first two elements and
then either NA or NaN in the third slot, where the logical indexing vector
is FALSE): why does the implementation "go looking" for an element
whose
index in the "original" vector, 4, is larger than BOTH the largest
index
specified in the inner-most subsetting index AND the size of the resulting
indexing vector?  (Note: at first I didn't even understand why the result
wasn't simply

0.3534253 -1.6731597         NA

but then I realized that the third logical index being FALSE, there was no
reason for *any* element to be there; but if there is, due to some
overriding rule regarding the length of the result relative to the length
of the indexer, shouldn't it revert back to *something* that indicates the
"FALSE"ness of that indexing element?)

Thanks!

DLG
> sessionInfo()R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ISwR_2.0-7

loaded via a namespace (and not attached):
[1] compiler_3.5.2 tools_3.5.2

        [[alternative HTML version deleted]]



	[[alternative HTML version deleted]]

R help - Mar 2019 - Q re: logical indexing with is.na

[R] Q re: logical indexing with is.na

[R] Q re: logical indexing with is.na

[R] [FORGED] Q re: logical indexing with is.na

[R] [FORGED] Q re: logical indexing with is.na

[R] Q re: logical indexing with is.na