thr3ads.net - R help - [R] which() vs. just logical selection in df [Oct 2020]

If this information is useful, please help other people find it:
Share via:

1/k^c

2020-Oct-14 22:23 UTC

[R] which() vs. just logical selection in df

Hi Dr. Snow, & R-helpers,

Thank you for your reply! I hadn't heard of the {microbenchmark}
package & was excited to try it! Thank you for the suggestion! I did
check the reference source for which() beforehand, which included the
statement to remove NAa, and I didn't have any missing values or NAs:

sum(is.na(dat$gender2))
sum(is.na(dat$gender))
sum(is.na(dat$y))

[1] 0
[1] 0
[1] 0

I still had a 10ms difference in the value returned by microbenchmark
between the following methods: one with and one without using which().
The difference is reversed from what I expected, since which() is an
extra step.

microbenchmark(
  head(
    dat[which(dat$gender2=="other"),],), times=100L)
microbenchmark(
  head(
    dat[dat$gender2=="other",],), times=100L)

         min                lq                 mean
head(dat[which(dat$gender2 == "other"), ], )      62.93803
74.25939     88.4704
head(dat[dat$gender2 == "other", ], )                 71.8914
87.95844    103.7231

Is which() invoking c-level code by chance, making it slightly faster
on average? The difference likely becomes important on terabytes of
data. The addition of which() still seems superfluous to me, and I'd
like to know whether it's considered best practice to keep it. What is
R inoking when which() isn't called explicitly? Is R invoking which()
eventually anyway?

Cheers!
Keith
> Message: 2
> Date: Mon, 12 Oct 2020 13:01:36 -0600
> From: Greg Snow <538280 at gmail.com>
> To: "1/k^c" <kchamberln at gmail.com>
> Cc: r-help <r-help at r-project.org>
> Subject: Re: [R] which() vs. just logical selection in df
> Message-ID:
>         <CAFEqCdyUuHh5TZ7t5NJ8cs_4xB61mNeUgasncekD485eBNRK6Q at
mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I would suggest using the microbenchmark package to do the time
> comparison.  This will run each a bunch of times for a more meaningful
> comparison.
>
> One possible reason for the difference is the number of missing values
> in your data (along with the number of columns).  Consider the
> difference in the following results:
>
> > x <- c(1,2,NA)
> > x[x==1]
> [1]  1 NA
> > x[which(x==1)]
> [1] 1
>
>
>
> On Sat, Oct 10, 2020 at 5:25 PM 1/k^c <kchamberln at gmail.com>
wrote:
> >
> > Hi R-helpers,
> >
> > Does anyone know why adding which() makes the select call more
> > efficient than just using logical selection in a dataframe?
Doesn't
> > which() technically add another conversion/function call on top of the
> > logical selection? Here is a reproducible example with a slight
> > difference in timing.
> >
> > # Surrogate data - the timing here isn't interesting
> > urltext <- paste("https://drive.google.com/",
> >                  "uc?id=1AZ-s1EgZXs4M_XF3YYEaKjjMMvRQ7",
> >                  "-h8&export=download",
sep="")
> > download.file(url=urltext, destfile="tempfile.csv") #
download file first
> > dat <- read.csv("tempfile.csv", stringsAsFactors = FALSE,
header=TRUE,
> >                   nrows=2.5e6) # read the file; 'nrows' is a
slight
> >                                          # overestimate
> > dat <- dat[,1:3] # select just the first 3 columns
> > head(dat, 10) # print the first 10 rows
> >
> > # Select using which() as the final step ~ 90ms total time on my
macbook air
> > system.time(
> >   head(
> >     dat[which(dat$gender2=="other"),],),
> >   gcFirst=TRUE)
> >
> > # Select skipping which() ~130ms total time
> > system.time(
> >   head(
> >     dat[dat$gender2=="other", ]),
> >   gcFirst=TRUE)
> >
> > Now I would think that the second one without which() would be more
> > efficient. However, every time I run these, the first version, with
> > which() is more efficient by about 20ms of system time and 20ms of
> > user time. Does anyone know why this is?
> >
> > Cheers!
> > Keith
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538280 at gmail.com
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 12 Oct 2020 08:33:44 +0200 (CEST)
> From: =?UTF-8?Q?Frauke_G=C3=BCnther?= <guenther at leibniz-bips.de>
> To: "r-help at r-project.org" <r-help at r-project.org>
> Cc: William Michels <wjm1 at caa.columbia.edu>, "smm at
posteo.org"
>         <smm at posteo.org>
> Subject: Re: [R]  Fwd:  Help using the exclude option in the neuralnet
>         package
> Message-ID: <957726669.124476.1602484424752 at srvmail.bips.eu>
> Content-Type: text/plain; charset="utf-8"
>
> Dear all,
>
> the exclude and constant.weights options are used as follows:
>
> exclude: A matrix with n rows and 3 columns will exclude n weights. The the
first column refers to the layer, the second column to the input neuron and the
third column to the output neuron of the weight.
>
> constant.weights: A vector specifying the values of the weights that are
excluded from the training process and treated as fix.
>
> Please refer to the following example:
>
> Not using exclude and constant.weights (all weights are trained):
>
> > nn <- neuralnet(Species == "setosa" ~ Petal.Length +
Petal.Width, iris, linear.output = FALSE)
> >
> > nn$weights
> [[1]]
> [[1]][[1]]
> [,1]
> [1,] 6.513239
> [2,] -0.815920
> [3,] -5.859802
> [[1]][[2]]
> [,1]
> [1,] -4.597934
> [2,] 9.179436
>
> Using exclude (2 weights are excluded --> NA):
>
> > nn <- neuralnet(Species == "setosa" ~ Petal.Length +
Petal.Width, iris, linear.output = FALSE,
> exclude = matrix(c(1,2,1, 2,2,1),byrow=T, nrow=2))
> > nn$weights
> [[1]]
> [[1]][[1]]
> [,1]
> [1,] -0.2815942
> [2,] NA
> [3,] 0.2481212
> [[1]][[2]]
> [,1]
> [1,] -0.6934932
> [2,] NA
>
> Using exclude and constant.weights (2 weights are excluded and treated as
fix --> 100 and 1000, respectively):
>
> > nn <- neuralnet(Species == "setosa" ~ Petal.Length +
Petal.Width, iris, linear.output = FALSE,
> exclude = matrix(c(1,2,1, 2,2,1),byrow=T, nrow=2),
> constant.weights=c(100,1000))
> > nn$weights
> [[1]]
> [[1]][[1]]
> [,1]
> [1,] 0.554119
> [2,] 100.000000
> [3,] 1.153611
> [[1]][[2]]
> [,1]
> [1,] -0.3962524
> [2,] 1000.0000000
>
> I hope you will find this example helpful.
>
> Sincerely,
> Frauke
>
>
> >     William Michels <wjm1 at caa.columbia.edu mailto:wjm1 at
caa.columbia.edu > hat am 10.10.2020 18:16 geschrieben:
> >
> >
> >     Forwarding: Question re "neuralnet" package on the
R-Help mailing list:
> >
> >     https://stat.ethz.ch/pipermail/r-help/2020-October/469020.html
> >
> >     If you are so inclined, please reply to:
> >
> >     r-help at r-project.org mailto:r-help at r-project.org <r-help
at r-project.org mailto:r-help at r-project.org >
> >
> >     ---------- Forwarded message ---------
> >     From: Dan Ryan <Dan.Ryan at unbc.ca mailto:Dan.Ryan at unbc.ca
>
> >     Date: Fri, Oct 9, 2020 at 3:52 PM
> >     Subject: Re: [R] Help using the exclude option in the neuralnet
package
> >     To: r-help at r-project.org mailto:r-help at r-project.org
<r-help at r-project.org mailto:r-help at r-project.org >
> >
> >     Good Morning,
> >
> >     I am using the neuralnet package in R, and am able to produce some
> >     basic neural nets, and use the output.
> >
> >     I would like to exclude some of the weights and biases from the
> >     iteration process and fix their values.
> >
> >     However I do not seem to be able to correctly define the exclude
and
> >     constant.weights vectors.
> >
> >     Question: Can someone point me to an example where exclude and
> >     contant.weights are used. I have search the R help archive, and
> >     haven't found any examples which use these on the web.
> >
> >     Thank you in advance for any help.
> >
> >     Sincerely
> >
> >     Dan
> >
> >     [[alternative HTML version deleted]]
> >
> >     ______________________________________________
> >     R-help at r-project.org mailto:R-help at r-project.org mailing
list -- To UNSUBSCRIBE and more, see
> >     https://stat.ethz.ch/mailman/listinfo/r-help
> >     PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >     and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 13 Oct 2020 08:04:32 +0200
> From: Ablaye Ngalaba <ablayengalaba at gmail.com>
> To: R-help at r-project.org
> Subject: [R] package for kernel on R
> Message-ID:
>         <CAOkWQv2YoQPpsBUJzV3i4EhAYHNRVZP3vuRXeBA28fLKSUdeqA at
mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
> Please, I want to know which package to install on R when coding the kernel
> functions
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 13 Oct 2020 09:09:00 +0200
> From: Ablaye Ngalaba <ablayengalaba at gmail.com>
> To: R-help at r-project.org
> Subject: [R] help for R code
> Message-ID:
>         <CAOkWQv0LsgxkHdqpai1=9BpLmp6tAdNwZiqTihA8zrirkf2yFQ at
mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Good morning dear administrators,
> Please help me to code this code in R.
> I use in this file the redescription function ? which by making a scalar
> product gives a . You can also choose instead of the redescription function
> ? a kernel k(x,x).
>
>
>
>
>                   Sincerely
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 13 Oct 2020 11:21:45 +0300
> From: Eric Berger <ericjberger at gmail.com>
> To: Ablaye Ngalaba <ablayengalaba at gmail.com>
> Cc: R mailing list <R-help at r-project.org>
> Subject: Re: [R] help for R code
> Message-ID:
>         <CAGgJW74TP-+L6gg0_BLbnayL657Ejw+_fvQ+tScsaDgEj8vQDA at
mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Ablaye,
> The CRAN repository has thousands of available R packages. To help
> people find relevant packages amid such a huge collection, there are
> some 'task view' pages that group packages according to a
particular
> task. I am guessing that you are interested in kernels because of
> their use in machine learning, so you might want to look at the
> Machine Learning task view at:
>
> https://cran.r-project.org/web/views/MachineLearning.html
>
> If you search for 'kernels' on that page you will find
>
> 'Support Vector Machines and Kernel Methods' which mentions a few
> packages that use kernels.
>
> Good luck,
> Eric
>
>
> On Tue, Oct 13, 2020 at 10:09 AM Ablaye Ngalaba <ablayengalaba at
gmail.com> wrote:
> >
> > Good morning dear administrators,
> > Please help me to code this code in R.
> > I use in this file the redescription function ? which by making a
scalar
> > product gives a . You can also choose instead of the redescription
function
> > ? a kernel k(x,x).
> >
> >
> >
> >
> >                   Sincerely
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
>
> End of R-help Digest, Vol 212, Issue 12
> ***************************************

Bert Gunter

2020-Oct-14 22:42 UTC

head link

[R] which() vs. just logical selection in df

Inline.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Oct 14, 2020 at 3:23 PM 1/k^c <kchamberln at gmail.com> wrote:

Is which() invoking c-level code by chance, making it slightly
faster> on average?
>
You do not need to ask such questions. R is open source, so just look!
> whichfunction (x, arr.ind = FALSE, useNames = TRUE)
{
    wh <- .Internal(which(x))   ## C code
    if (arr.ind && !is.null(d <- dim(x)))
        arrayInd(wh, d, dimnames(x), useNames = useNames)
    else wh
}
<bytecode: 0x7fcdba0b8e80>
<environment: namespace:base>

	[[alternative HTML version deleted]]

1/k^c

2020-Oct-15 02:23 UTC

head link

[R] which() vs. just logical selection in df

Hi Bert,

Thank you very much! I was unaware that .Internal() referred to C code.

I figured out the difference. which() dimensions the object returned
to be only the relevant records first. Logical indexing dimensions
last.
> length(index1<-dat$gender2=="other")
[1] 2000000> length(index2<-which(index1))[1] 666667
length(dat[index1,])
[1] 666667
length(dat[index2,])
[1] 666667

microbenchmark(index1<-dat$gender2=="other", times=100L) # 2e6
records, ~ 13ms.
microbenchmark(index2<-which(index1), times=100L) # Extra time for
which() ~ 5ms.
microbenchmark(dat[index1,], times=100L) # Time to return just TRUE
records using the whole 2e6 index. ~99ms
microbenchmark(dat[index2,], times=100L) # Time to return all records
from shorter index ~64ms.

Cheers,
Keith


On Wed, Oct 14, 2020 at 4:42 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:>
> Inline.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Wed, Oct 14, 2020 at 3:23 PM 1/k^c <kchamberln at gmail.com>
wrote:
>
>> Is which() invoking c-level code by chance, making it slightly faster
>> on average?
>
>
> You do not need to ask such questions. R is open source, so just look!
>
> > which
> function (x, arr.ind = FALSE, useNames = TRUE)
> {
>     wh <- .Internal(which(x))   ## C code
>     if (arr.ind && !is.null(d <- dim(x)))
>         arrayInd(wh, d, dimnames(x), useNames = useNames)
>     else wh
> }
> <bytecode: 0x7fcdba0b8e80>
> <environment: namespace:base>

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Oct 2020 - which() vs. just logical selection in df

[R] which() vs. just logical selection in df

[R] which() vs. just logical selection in df

[R] which() vs. just logical selection in df

Apparently Analagous Threads