Regarding the mention of logical indexing, under ?Extract I see: For?[-indexing only:?i,?j,?...?can be logical vectors, indicating elements/slices to select. Such vectors are recycled if necessary to match the corresponding extent.?i,?j,?...?can also be negative integers, indicating elements/slices to leave out of the selection. On March 9, 2019 6:57:05 PM PST, Rolf Turner <r.turner at auckland.ac.nz> wrote:>On 3/10/19 2:36 PM, David Goldsmith wrote: >> Hi! Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/ R"; >not >> new to statistics (have had grad-level courses and work experience in >> statistics) or vectorized programming syntax (have extensive >experience >> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time >ago--of >> experience w/ S-plus). >> >> In exploring the use of is.na in the context of logical indexing, >I've come >> across the following puzzling-to-me result: >> >>> y; !is.na(y[1:3]); y[!is.na(y[1:3])] >> [1] 0.3534253 -1.6731597 NA -0.2079209 >> [1] TRUE TRUE FALSE >> [1] 0.3534253 -1.6731597 -0.2079209 >> >> As you can see, y is a four element vector, the third element of >which is >> NA; the next line gives what I would expect--T T F--because the first >two >> elements are not NA but the third element is. The third line is what >> confuses me: why is the result not the two element vector consisting >of >> simply the first two elements of the vector (or, if vectorized >indexing in >> R is implemented to return a vector the same length as the logical >index >> vector, which appears to be the case, at least the first two elements >and >> then either NA or NaN in the third slot, where the logical indexing >vector >> is FALSE): why does the implementation "go looking" for an element >whose >> index in the "original" vector, 4, is larger than BOTH the largest >index >> specified in the inner-most subsetting index AND the size of the >resulting >> indexing vector? (Note: at first I didn't even understand why the >result >> wasn't simply >> >> 0.3534253 -1.6731597 NA >> >> but then I realized that the third logical index being FALSE, there >was no >> reason for *any* element to be there; but if there is, due to some >> overriding rule regarding the length of the result relative to the >length >> of the indexer, shouldn't it revert back to *something* that >indicates the >> "FALSE"ness of that indexing element?) >> >> Thanks! > >It happens because R is eco-concious and re-cycles. :-) > >Try: > >ok <- c(TRUE,TRUE,FALSE) >(1:4)[ok] > >In general in R if there is an operation involving two vectors then >the shorter one gets recycled to provide sufficiently many entries to >match those of the longer vector. > >This in the foregoing example the first entry of "ok" gets used again, >to make a length 4 vector to match up with 1:4. The result is the same > >as (1:4)[c(TRUE,TRUE,FALSE,TRUE)]. > >If you did (1:7)[ok] you'd get the same result as that from >(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets >recycled 2 and 1/3 times. > >Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 . > >Note that in the first two instances you get warnings, but in the third >you don't, since 6 is an integer multiple of 3. > >Why aren't there warnings when logical indexing is used? I guess >because it would be annoying. Maybe. > >Note that integer indices get recycled too, but the recycling is >limited >so as not to produce redundancies. So > >(1:4)[1:3] just (sensibly) gives > >[1] 1 2 3 > >and *not* > >[1] 1 2 3 1 > >Perhaps a bit subtle, but it gives what you'd actually *want* rather >than being pedantic about rules with a result that you wouldn't want. > >cheers, > >Rolf Turner > >P.S. If you do > >y[1:3][!is.na(y[1:3])] > >i.e. if you're careful to match the length of the vector and the that >of >the indices, you get what you initially expected. > >R. T. > >P^2.S. To the younger and wiser heads on this list: the help on "[" >does not mention that the index vectors can be logical. I couldn't >find >anything about logical indexing in the R help files. Is something >missing here, or am I just not looking in the right place? > >R. T.-- Sent from my phone. Please excuse my brevity.
On 3/10/19 6:07 PM, Jeff Newmiller wrote:> Regarding the mention of logical indexing, under ?Extract I see: > > For [-indexing only: i, j, ... can be logical vectors, indicating > elements/slices to select. Such vectors are recycled if necessary to > match the corresponding extent. i, j, ... can also be negative > integers, indicating elements/slices to leave out of the selection.Dang! It was staring me in the face all the time, and I didn't see it! Grrrrrr. Thanks Jeff. cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Thanks, all. I had read about recycling, but I guess I didn't fully appreciate all the "weirdness" it might produce. :/ With this explained, I'm going to ask a follow-up, which is only contextually related: the impetus for this discovery was checking "corner cases" to determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to determine equality of two vectors containing NA's. Between the above result; my related discovery that this indexing preserves relative positional info but not absolute positional info; and the performance penalty when comparing long vectors that may be unequal "early on"; I've concluded that--if it (can be made to) "short circuit"--it would probably be better to use an implicit loop. So that's my Q: will (or can) an implicit loop (be made to) "exit early" if a specified condition is met before all indices have been checked? Thanks again! DLG On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> Regarding the mention of logical indexing, under ?Extract I see: > > For [-indexing only: i, j, ... can be logical vectors, indicating > elements/slices to select. Such vectors are recycled if necessary to match > the corresponding extent. i, j, ... can also be negative integers, > indicating elements/slices to leave out of the selection. > > On March 9, 2019 6:57:05 PM PST, Rolf Turner <r.turner at auckland.ac.nz> > wrote: > >On 3/10/19 2:36 PM, David Goldsmith wrote: > >> Hi! Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/ R"; > >not > >> new to statistics (have had grad-level courses and work experience in > >> statistics) or vectorized programming syntax (have extensive > >experience > >> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time > >ago--of > >> experience w/ S-plus). > >> > >> In exploring the use of is.na in the context of logical indexing, > >I've come > >> across the following puzzling-to-me result: > >> > >>> y; !is.na(y[1:3]); y[!is.na(y[1:3])] > >> [1] 0.3534253 -1.6731597 NA -0.2079209 > >> [1] TRUE TRUE FALSE > >> [1] 0.3534253 -1.6731597 -0.2079209 > >> > >> As you can see, y is a four element vector, the third element of > >which is > >> NA; the next line gives what I would expect--T T F--because the first > >two > >> elements are not NA but the third element is. The third line is what > >> confuses me: why is the result not the two element vector consisting > >of > >> simply the first two elements of the vector (or, if vectorized > >indexing in > >> R is implemented to return a vector the same length as the logical > >index > >> vector, which appears to be the case, at least the first two elements > >and > >> then either NA or NaN in the third slot, where the logical indexing > >vector > >> is FALSE): why does the implementation "go looking" for an element > >whose > >> index in the "original" vector, 4, is larger than BOTH the largest > >index > >> specified in the inner-most subsetting index AND the size of the > >resulting > >> indexing vector? (Note: at first I didn't even understand why the > >result > >> wasn't simply > >> > >> 0.3534253 -1.6731597 NA > >> > >> but then I realized that the third logical index being FALSE, there > >was no > >> reason for *any* element to be there; but if there is, due to some > >> overriding rule regarding the length of the result relative to the > >length > >> of the indexer, shouldn't it revert back to *something* that > >indicates the > >> "FALSE"ness of that indexing element?) > >> > >> Thanks! > > > >It happens because R is eco-concious and re-cycles. :-) > > > >Try: > > > >ok <- c(TRUE,TRUE,FALSE) > >(1:4)[ok] > > > >In general in R if there is an operation involving two vectors then > >the shorter one gets recycled to provide sufficiently many entries to > >match those of the longer vector. > > > >This in the foregoing example the first entry of "ok" gets used again, > >to make a length 4 vector to match up with 1:4. The result is the same > > > >as (1:4)[c(TRUE,TRUE,FALSE,TRUE)]. > > > >If you did (1:7)[ok] you'd get the same result as that from > >(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets > >recycled 2 and 1/3 times. > > > >Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 . > > > >Note that in the first two instances you get warnings, but in the third > >you don't, since 6 is an integer multiple of 3. > > > >Why aren't there warnings when logical indexing is used? I guess > >because it would be annoying. Maybe. > > > >Note that integer indices get recycled too, but the recycling is > >limited > >so as not to produce redundancies. So > > > >(1:4)[1:3] just (sensibly) gives > > > >[1] 1 2 3 > > > >and *not* > > > >[1] 1 2 3 1 > > > >Perhaps a bit subtle, but it gives what you'd actually *want* rather > >than being pedantic about rules with a result that you wouldn't want. > > > >cheers, > > > >Rolf Turner > > > >P.S. If you do > > > >y[1:3][!is.na(y[1:3])] > > > >i.e. if you're careful to match the length of the vector and the that > >of > >the indices, you get what you initially expected. > > > >R. T. > > > >P^2.S. To the younger and wiser heads on this list: the help on "[" > >does not mention that the index vectors can be logical. I couldn't > >find > >anything about logical indexing in the R help files. Is something > >missing here, or am I just not looking in the right place? > > > >R. T. > > -- > Sent from my phone. Please excuse my brevity. >[[alternative HTML version deleted]]
On 10/03/2019 1:15 a.m., David Goldsmith wrote:> Thanks, all. I had read about recycling, but I guess I didn't fully > appreciate all the "weirdness" it might produce. :/ > > With this explained, I'm going to ask a follow-up, which is only > contextually related: the impetus for this discovery was checking "corner > cases" to determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to > determine equality of two vectors containing NA's. Between the above > result; my related discovery that this indexing preserves relative > positional info but not absolute positional info; and the performance > penalty when comparing long vectors that may be unequal "early on"; I've > concluded that--if it (can be made to) "short circuit"--it would probably > be better to use an implicit loop. So that's my Q: will (or can) an > implicit loop (be made to) "exit early" if a specified condition is met > before all indices have been checked?You could use the identical() function. When I have vectors of length 1 million, all(x == y) takes about 3 milliseconds when the difference is in the last value, 2 milliseconds when it comes first. identical(x, y) takes about 5 milliseconds when the difference comes last, but 0.006 milliseconds when it comes first. Of course, all(x == y) and identical(x, y) do slightly different tests: read the docs! Duncan Murdoch> > Thanks again! > > DLG > > On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> > wrote: > >> Regarding the mention of logical indexing, under ?Extract I see: >> >> For [-indexing only: i, j, ... can be logical vectors, indicating >> elements/slices to select. Such vectors are recycled if necessary to match >> the corresponding extent. i, j, ... can also be negative integers, >> indicating elements/slices to leave out of the selection. >> >> On March 9, 2019 6:57:05 PM PST, Rolf Turner <r.turner at auckland.ac.nz> >> wrote: >>> On 3/10/19 2:36 PM, David Goldsmith wrote: >>>> Hi! Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/ R"; >>> not >>>> new to statistics (have had grad-level courses and work experience in >>>> statistics) or vectorized programming syntax (have extensive >>> experience >>>> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time >>> ago--of >>>> experience w/ S-plus). >>>> >>>> In exploring the use of is.na in the context of logical indexing, >>> I've come >>>> across the following puzzling-to-me result: >>>> >>>>> y; !is.na(y[1:3]); y[!is.na(y[1:3])] >>>> [1] 0.3534253 -1.6731597 NA -0.2079209 >>>> [1] TRUE TRUE FALSE >>>> [1] 0.3534253 -1.6731597 -0.2079209 >>>> >>>> As you can see, y is a four element vector, the third element of >>> which is >>>> NA; the next line gives what I would expect--T T F--because the first >>> two >>>> elements are not NA but the third element is. The third line is what >>>> confuses me: why is the result not the two element vector consisting >>> of >>>> simply the first two elements of the vector (or, if vectorized >>> indexing in >>>> R is implemented to return a vector the same length as the logical >>> index >>>> vector, which appears to be the case, at least the first two elements >>> and >>>> then either NA or NaN in the third slot, where the logical indexing >>> vector >>>> is FALSE): why does the implementation "go looking" for an element >>> whose >>>> index in the "original" vector, 4, is larger than BOTH the largest >>> index >>>> specified in the inner-most subsetting index AND the size of the >>> resulting >>>> indexing vector? (Note: at first I didn't even understand why the >>> result >>>> wasn't simply >>>> >>>> 0.3534253 -1.6731597 NA >>>> >>>> but then I realized that the third logical index being FALSE, there >>> was no >>>> reason for *any* element to be there; but if there is, due to some >>>> overriding rule regarding the length of the result relative to the >>> length >>>> of the indexer, shouldn't it revert back to *something* that >>> indicates the >>>> "FALSE"ness of that indexing element?) >>>> >>>> Thanks! >>> >>> It happens because R is eco-concious and re-cycles. :-) >>> >>> Try: >>> >>> ok <- c(TRUE,TRUE,FALSE) >>> (1:4)[ok] >>> >>> In general in R if there is an operation involving two vectors then >>> the shorter one gets recycled to provide sufficiently many entries to >>> match those of the longer vector. >>> >>> This in the foregoing example the first entry of "ok" gets used again, >>> to make a length 4 vector to match up with 1:4. The result is the same >>> >>> as (1:4)[c(TRUE,TRUE,FALSE,TRUE)]. >>> >>> If you did (1:7)[ok] you'd get the same result as that from >>> (1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets >>> recycled 2 and 1/3 times. >>> >>> Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 . >>> >>> Note that in the first two instances you get warnings, but in the third >>> you don't, since 6 is an integer multiple of 3. >>> >>> Why aren't there warnings when logical indexing is used? I guess >>> because it would be annoying. Maybe. >>> >>> Note that integer indices get recycled too, but the recycling is >>> limited >>> so as not to produce redundancies. So >>> >>> (1:4)[1:3] just (sensibly) gives >>> >>> [1] 1 2 3 >>> >>> and *not* >>> >>> [1] 1 2 3 1 >>> >>> Perhaps a bit subtle, but it gives what you'd actually *want* rather >>> than being pedantic about rules with a result that you wouldn't want. >>> >>> cheers, >>> >>> Rolf Turner >>> >>> P.S. If you do >>> >>> y[1:3][!is.na(y[1:3])] >>> >>> i.e. if you're careful to match the length of the vector and the that >>> of >>> the indices, you get what you initially expected. >>> >>> R. T. >>> >>> P^2.S. To the younger and wiser heads on this list: the help on "[" >>> does not mention that the index vectors can be logical. I couldn't >>> find >>> anything about logical indexing in the R help files. Is something >>> missing here, or am I just not looking in the right place? >>> >>> R. T. >> >> -- >> Sent from my phone. Please excuse my brevity. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Do you want something like this?> x <- c(1,2,NA, 3, 4, 5, NA, 6,7,8, NA, NA, 9,10) > y <- c(1,2,NA, NA, 3, 4, 5, 6, NA, 7,8, NA, NA, 9,10) > identical(x[which(!is.na(x))], y[which(!is.na(y))])[1] TRUE If I expect NA and want to extract or compare something, I tend to use which to select only non NA elements. Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of David Goldsmith > Sent: Sunday, March 10, 2019 7:16 AM > Cc: r-help at r-project.org > Subject: Re: [R] [FORGED] Q re: logical indexing with is.na > > Thanks, all. I had read about recycling, but I guess I didn't fully appreciate all > the "weirdness" it might produce. :/ > > With this explained, I'm going to ask a follow-up, which is only contextually > related: the impetus for this discovery was checking "corner cases" to > determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to determine equality of > two vectors containing NA's. Between the above result; my related discovery > that this indexing preserves relative positional info but not absolute positional > info; and the performance penalty when comparing long vectors that may be > unequal "early on"; I've concluded that--if it (can be made to) "short circuit"--it > would probably be better to use an implicit loop. So that's my Q: will (or can) > an implicit loop (be made to) "exit early" if a specified condition is met before > all indices have been checked? > > Thanks again! > > DLG > > On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> > wrote: > > > Regarding the mention of logical indexing, under ?Extract I see: > > > > For [-indexing only: i, j, ... can be logical vectors, indicating > > elements/slices to select. Such vectors are recycled if necessary to > > match the corresponding extent. i, j, ... can also be negative > > integers, indicating elements/slices to leave out of the selection. > > > > On March 9, 2019 6:57:05 PM PST, Rolf Turner <r.turner at auckland.ac.nz> > > wrote: > > >On 3/10/19 2:36 PM, David Goldsmith wrote: > > >> Hi! Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/ > > >> R"; > > >not > > >> new to statistics (have had grad-level courses and work experience > > >> in > > >> statistics) or vectorized programming syntax (have extensive > > >experience > > >> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time > > >ago--of > > >> experience w/ S-plus). > > >> > > >> In exploring the use of is.na in the context of logical indexing, > > >I've come > > >> across the following puzzling-to-me result: > > >> > > >>> y; !is.na(y[1:3]); y[!is.na(y[1:3])] > > >> [1] 0.3534253 -1.6731597 NA -0.2079209 > > >> [1] TRUE TRUE FALSE > > >> [1] 0.3534253 -1.6731597 -0.2079209 > > >> > > >> As you can see, y is a four element vector, the third element of > > >which is > > >> NA; the next line gives what I would expect--T T F--because the > > >> first > > >two > > >> elements are not NA but the third element is. The third line is > > >> what confuses me: why is the result not the two element vector > > >> consisting > > >of > > >> simply the first two elements of the vector (or, if vectorized > > >indexing in > > >> R is implemented to return a vector the same length as the logical > > >index > > >> vector, which appears to be the case, at least the first two > > >> elements > > >and > > >> then either NA or NaN in the third slot, where the logical indexing > > >vector > > >> is FALSE): why does the implementation "go looking" for an element > > >whose > > >> index in the "original" vector, 4, is larger than BOTH the largest > > >index > > >> specified in the inner-most subsetting index AND the size of the > > >resulting > > >> indexing vector? (Note: at first I didn't even understand why the > > >result > > >> wasn't simply > > >> > > >> 0.3534253 -1.6731597 NA > > >> > > >> but then I realized that the third logical index being FALSE, there > > >was no > > >> reason for *any* element to be there; but if there is, due to some > > >> overriding rule regarding the length of the result relative to the > > >length > > >> of the indexer, shouldn't it revert back to *something* that > > >indicates the > > >> "FALSE"ness of that indexing element?) > > >> > > >> Thanks! > > > > > >It happens because R is eco-concious and re-cycles. :-) > > > > > >Try: > > > > > >ok <- c(TRUE,TRUE,FALSE) > > >(1:4)[ok] > > > > > >In general in R if there is an operation involving two vectors then > > >the shorter one gets recycled to provide sufficiently many entries to > > >match those of the longer vector. > > > > > >This in the foregoing example the first entry of "ok" gets used > > >again, to make a length 4 vector to match up with 1:4. The result is > > >the same > > > > > >as (1:4)[c(TRUE,TRUE,FALSE,TRUE)]. > > > > > >If you did (1:7)[ok] you'd get the same result as that from > > >(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets > > >recycled 2 and 1/3 times. > > > > > >Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 . > > > > > >Note that in the first two instances you get warnings, but in the > > >third you don't, since 6 is an integer multiple of 3. > > > > > >Why aren't there warnings when logical indexing is used? I guess > > >because it would be annoying. Maybe. > > > > > >Note that integer indices get recycled too, but the recycling is > > >limited so as not to produce redundancies. So > > > > > >(1:4)[1:3] just (sensibly) gives > > > > > >[1] 1 2 3 > > > > > >and *not* > > > > > >[1] 1 2 3 1 > > > > > >Perhaps a bit subtle, but it gives what you'd actually *want* rather > > >than being pedantic about rules with a result that you wouldn't want. > > > > > >cheers, > > > > > >Rolf Turner > > > > > >P.S. If you do > > > > > >y[1:3][!is.na(y[1:3])] > > > > > >i.e. if you're careful to match the length of the vector and the that > > >of the indices, you get what you initially expected. > > > > > >R. T. > > > > > >P^2.S. To the younger and wiser heads on this list: the help on "[" > > >does not mention that the index vectors can be logical. I couldn't > > >find anything about logical indexing in the R help files. Is > > >something missing here, or am I just not looking in the right place? > > > > > >R. T. > > > > -- > > Sent from my phone. Please excuse my brevity. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/