search for: tdhock

Displaying 20 results from an estimated 21 matches for "tdhock".

2019 Feb 20
2
Bug: time complexity of substring is quadratic as string size and number of substrings increases
...t, and many substrings to extract. For example substring("AAAA", 1:4, 1:4) or more generally, N=1000 substring(paste(rep("A", N), collapse=""), 1:N, 1:N) The problem I observe is that the time complexity is quadratic in N, as shown on this figure https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png source: https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R I expected the time complexity to be linear in N. The example above may seem contrived/trivial, but it is indeed relevant to a number of packages (re...
2019 Feb 22
1
Bug: time complexity of substring is quadratic as string size and number of substrings increases
On 2/20/19 7:55 PM, Toby Hocking wrote: > Update: I have observed that stringi::stri_sub is linear time complexity, > and it computes the same thing as base::substring. figure > https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png > source: > https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R > > To me this is a clear indication of a bug in substring, but again it would > be nice to have some feedback/confirmation before p...
2020 Jun 09
2
valgrind false positive on R startup?
Hi all, I'm on Ubuntu 18.04, running R-4.0.0 which I compiled from source, and using valgrind I am always seeing the following message. Does anybody else see that? Is that a known false positive? Any ideas how to fix/suppress? Seems related to TRE, do I need to upgrade that? (base) tdhock at maude-MacBookPro:~/R/binsegRcpp$ R --vanilla -d valgrind -e 'extSoftVersion()' ==9565== Memcheck, a memory error detector ==9565== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==9565== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==9565== Comm...
2019 Feb 20
0
Bug: time complexity of substring is quadratic as string size and number of substrings increases
Update: I have observed that stringi::stri_sub is linear time complexity, and it computes the same thing as base::substring. figure https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png source: https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R To me this is a clear indication of a bug in substring, but again it would be nice to have some feedback/confirmation before posting on bugzilla. Als...
2020 Jun 10
0
valgrind false positive on R startup?
...;m on Ubuntu 18.04, running R-4.0.0 which I compiled from source, and > using valgrind I am always seeing the following message. Does anybody > else see that? Is that a known false positive? Any ideas how to > fix/suppress? Seems related to TRE, do I need to upgrade that? > > (base) tdhock at maude-MacBookPro:~/R/binsegRcpp$ R --vanilla -d valgrind > -e 'extSoftVersion()' > ==9565== Memcheck, a memory error detector > ==9565== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==9565== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyrigh...
2019 Feb 19
1
patch for gregexpr(perl=TRUE)
...tps://stat.ethz.ch/pipermail/r-help/2008-October/178451.html I figured out the issue, which is fixed by changing 1 line of code in src/main/grep.c -- there is a strlen function call which is currently inside of the while loop over matches, and the patch moves it before the loop. https://github.com/tdhock/namedCapture-article/blob/master/linear-time-gregexpr-perl.patch I made some figures that show the quadratic time complexity before applying the patch, and the linear time complexity after applying the patch https://github.com/tdhock/namedCapture-article#19-feb-2019 I would have posted a bug repo...
2023 Mar 30
1
write.csv performance improvements?
...iciencies that could be improved. 1. write.csv is quadratic time (N^2) in the number of columns N. Can write.csv be improved to use a linear time algorithm, so it can handle CSV files with larger numbers of columns? For more details including figures and session info, please see https://github.com/tdhock/atime/issues/9 2. write.csv uses memory that is linear in the number of rows, whereas similar R functions for writing CSV use only constant memory. This is not as important of an issue to fix, because anyway linear memory is used to store the data in R. But since the other functions use constant m...
2006 May 17
1
install.packages bug (PR#8873)
...39;,lib='~/mylib',dependencies=TRUE) However, this doesn't work. It does notice that X depends on Y and so it downloads Y first, but it downloads Y to the wrong directory!! It should download both to ~/mylib in my opinion!!! Sincerely, Toby Dylan Hocking http://www.ocf.berkeley.edu/~tdhock
2015 Sep 02
4
mclapply memory leak?
...ction(i)Sys.sleep(1/N*seconds)) On my system, memory usage goes up about 60MB on this example. But it does not go up at all if I change mclapply to lapply. Is this a bug? For a more detailed discussion with a figure that shows that the memory overhead is linear in N, please see https://github.com/tdhock/mclapply-memory > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu precise (12.04.5 LTS) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGE...
2015 Sep 02
0
mclapply memory leak?
...ge collector touches objects, as pointed out by Radford Neal here: http://r.789695.n4.nabble.com/Re-R-devel-Digest-Vol-149-Issue-22-td4710367.html If so, I don't think this would be easily avoidable, but there may be mitigation strategies. ~G On Wed, Sep 2, 2015 at 10:12 AM, Toby Hocking <tdhock5 at gmail.com> wrote: > Dear R-devel, > > I am running mclapply with many iterations over a function that modifies > nothing and makes no copies of anything. It is taking up a lot of memory, > so it seems to me like this is a bug. Should I post this to > bugs.r-project.org? &g...
2019 Aug 29
0
Feature request: non-dropping regmatches/strextract
...hen there are some package functions which do exactly that. three examples are namedCapture::df_match_variable, rematch2::bind_re_match, and tidyr::extract. For a more detailed discussion see my R journal submission (under review) about regular expression packages, https://raw.githubusercontent.com/tdhock/namedCapture-article/master/RJwrapper.pdf Comments/suggestions welcome. On Thu, Aug 15, 2019 at 12:15 AM Cyclic Group Z_1 via R-devel < r-devel at r-project.org> wrote: > A very common use case for regmatches is to extract regex matches into a > new column in a data.frame (or data.tab...
2019 Oct 29
0
stats::reshape quadratic in number of input columns
...as columns named Species,Sepal,Petal,dimension (where part is either Length or Width). Of course there is no performance issue with N=4 input columns in the original iris data, but I made larger versions of this reshaping problem by making copies of the input columns. The results https://github.com/tdhock/nc-article#28-oct-2019 show that the quadratic time complexity results in significant slowdowns after about N=10,000 input columns to reshape. (e.g. several minutes for stats::reshape versus several seconds for data.table::melt) For a fix, I would suggest looking into how they implemented the same...
2023 Mar 30
0
read.csv quadratic time in number of columns
...that read.csv is quadratic time (N^2) in the number of columns N, whereas the others are linear (N). Can read.csv be improved to use a linear time algorithm, so it can handle CSV files with larger numbers of columns? For more details including figures and session info, please see https://github.com/tdhock/atime/issues/8 Sincerely, Toby Dylan Hocking [[alternative HTML version deleted]]
2023 Dec 19
1
Partial matching performance in data frame rownames using [
Hi Hilmar and Ivan, I have used your code examples to write a blog post about this topic, which has figures that show the asymptotic time complexity of the various approaches, https://tdhock.github.io/blog/2023/df-partial-match/ The asymptotic complexity of partial matching appears to be quadratic O(N^2) whereas the other approaches are asymptotically faster: linear O(N) or log-linear O(N log N). I think that accepting Ivan's pmatch.rows patch would add un-necessary complexity to b...
2019 May 14
3
Pcre install
Hello, I downloaded R-3.6.0.tar.gz from https://cran.r-project.org/src/base/R-3/. I tried to install R-3.6.0.tar.gz in Ubuntu system. Thanks in advance for any help! yue checking for pcre.h... yes checking pcre/pcre.h usability... no checking pcre/pcre.h presence... no checking for pcre/pcre.h... no checking if PCRE version >= 8.20, < 10.0 and has UTF-8 support... no checking
2023 Dec 16
2
Partial matching performance in data frame rownames using [
On Wed, 13 Dec 2023 09:04:18 +0100 Hilmar Berger via R-devel <r-devel at r-project.org> wrote: > Still, I feel that default partial matching cripples the functionality > of data.frame for larger tables. Changing the default now would require a long deprecation cycle to give everyone who uses `[.data.frame` and relies on partial matching (whether they know it or not) enough time to
2015 Sep 03
0
mclapply memory leak?
Toby, > On Sep 2, 2015, at 1:12 PM, Toby Hocking <tdhock5 at gmail.com> wrote: > > Dear R-devel, > > I am running mclapply with many iterations over a function that modifies > nothing and makes no copies of anything. It is taking up a lot of memory, > so it seems to me like this is a bug. Should I post this to > bugs.r-project.o...
2019 May 29
2
R pkg install should fail for unsuccessful DLL copy on windows?
...l_0.2.2 geometry_0.4.1 [17] tools_3.6.0 glue_1.3.1 purrr_0.3.2 munsell_0.5.0 [21] abind_1.4-7 compiler_3.6.0 pkgconfig_2.0.2 colorspace_1.4-1 [25] tidyselect_0.2.5 tibble_2.1.1 > > ]0;MINGW64:/c/Users/th798/R th798 at cmp2986 MINGW64 ~/R $ related blog post: https://tdhock.github.io/blog/2019/windows-dll/ [[alternative HTML version deleted]]
2019 May 30
0
R pkg install should fail for unsuccessful DLL copy on windows?
Hi Toby, AFAIK it has not been addressed in R. You can handle the problem on your package side, see https://github.com/Rdatatable/data.table/pull/3237 Regards, Jan On Thu, May 30, 2019 at 4:46 AM Toby Hocking <tdhock5 at gmail.com> wrote: > > Hi all, > > I am having an issue related to installing packages on windows with > R-3.6.0. When installing a package that is in use, I expected R to stop > with an error. However I am getting a warning that the DLL copy was not > successful, but the...
2019 Aug 15
4
Feature request: non-dropping regmatches/strextract
A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignment is done without subsetting. For consistency with other R functions and compatibility with this use