Displaying 20 results from an estimated 21 matches for "tdhock".
2019 Feb 20
2
Bug: time complexity of substring is quadratic as string size and number of substrings increases
...t, and many substrings to extract. For example
substring("AAAA", 1:4, 1:4)
or more generally,
N=1000
substring(paste(rep("A", N), collapse=""), 1:N, 1:N)
The problem I observe is that the time complexity is quadratic in N, as
shown on this figure
https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png
source:
https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R
I expected the time complexity to be linear in N.
The example above may seem contrived/trivial, but it is indeed relevant to
a number of packages (re...
2019 Feb 22
1
Bug: time complexity of substring is quadratic as string size and number of substrings increases
On 2/20/19 7:55 PM, Toby Hocking wrote:
> Update: I have observed that stringi::stri_sub is linear time complexity,
> and it computes the same thing as base::substring. figure
> https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png
> source:
> https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R
>
> To me this is a clear indication of a bug in substring, but again it would
> be nice to have some feedback/confirmation before p...
2020 Jun 09
2
valgrind false positive on R startup?
Hi all,
I'm on Ubuntu 18.04, running R-4.0.0 which I compiled from source, and
using valgrind I am always seeing the following message. Does anybody
else see that? Is that a known false positive? Any ideas how to
fix/suppress? Seems related to TRE, do I need to upgrade that?
(base) tdhock at maude-MacBookPro:~/R/binsegRcpp$ R --vanilla -d valgrind
-e 'extSoftVersion()'
==9565== Memcheck, a memory error detector
==9565== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9565== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9565== Comm...
2019 Feb 20
0
Bug: time complexity of substring is quadratic as string size and number of substrings increases
Update: I have observed that stringi::stri_sub is linear time complexity,
and it computes the same thing as base::substring. figure
https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png
source:
https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R
To me this is a clear indication of a bug in substring, but again it would
be nice to have some feedback/confirmation before posting on bugzilla.
Als...
2020 Jun 10
0
valgrind false positive on R startup?
...;m on Ubuntu 18.04, running R-4.0.0 which I compiled from source, and
> using valgrind I am always seeing the following message. Does anybody
> else see that? Is that a known false positive? Any ideas how to
> fix/suppress? Seems related to TRE, do I need to upgrade that?
>
> (base) tdhock at maude-MacBookPro:~/R/binsegRcpp$ R --vanilla -d valgrind
> -e 'extSoftVersion()'
> ==9565== Memcheck, a memory error detector
> ==9565== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==9565== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyrigh...
2019 Feb 19
1
patch for gregexpr(perl=TRUE)
...tps://stat.ethz.ch/pipermail/r-help/2008-October/178451.html
I figured out the issue, which is fixed by changing 1 line of code in
src/main/grep.c -- there is a strlen function call which is currently
inside of the while loop over matches, and the patch moves it before the
loop.
https://github.com/tdhock/namedCapture-article/blob/master/linear-time-gregexpr-perl.patch
I made some figures that show the quadratic time complexity before applying
the patch, and the linear time complexity after applying the patch
https://github.com/tdhock/namedCapture-article#19-feb-2019
I would have posted a bug repo...
2023 Mar 30
1
write.csv performance improvements?
...iciencies that could be improved.
1. write.csv is quadratic time (N^2) in the number of columns N.
Can write.csv be improved to use a linear time algorithm, so it can handle
CSV files with larger numbers of columns?
For more details including figures and session info, please see
https://github.com/tdhock/atime/issues/9
2. write.csv uses memory that is linear in the number of rows, whereas
similar R functions for writing CSV use only constant memory. This is not
as important of an issue to fix, because anyway linear memory is used to
store the data in R. But since the other functions use constant m...
2006 May 17
1
install.packages bug (PR#8873)
...39;,lib='~/mylib',dependencies=TRUE)
However, this doesn't work. It does notice that X depends on Y and so it
downloads Y first, but it downloads Y to the wrong directory!! It should
download both to ~/mylib in my opinion!!!
Sincerely,
Toby Dylan Hocking
http://www.ocf.berkeley.edu/~tdhock
2015 Sep 02
4
mclapply memory leak?
...ction(i)Sys.sleep(1/N*seconds))
On my system, memory usage goes up about 60MB on this example. But it does
not go up at all if I change mclapply to lapply. Is this a bug?
For a more detailed discussion with a figure that shows that the memory
overhead is linear in N, please see
https://github.com/tdhock/mclapply-memory
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGE...
2015 Sep 02
0
mclapply memory leak?
...ge collector touches objects, as pointed out by Radford Neal
here:
http://r.789695.n4.nabble.com/Re-R-devel-Digest-Vol-149-Issue-22-td4710367.html
If so, I don't think this would be easily avoidable, but there may be
mitigation strategies.
~G
On Wed, Sep 2, 2015 at 10:12 AM, Toby Hocking <tdhock5 at gmail.com> wrote:
> Dear R-devel,
>
> I am running mclapply with many iterations over a function that modifies
> nothing and makes no copies of anything. It is taking up a lot of memory,
> so it seems to me like this is a bug. Should I post this to
> bugs.r-project.org?
&g...
2019 Aug 29
0
Feature request: non-dropping regmatches/strextract
...hen there are some package functions which do exactly that. three examples
are namedCapture::df_match_variable, rematch2::bind_re_match, and
tidyr::extract. For a more detailed discussion see my R journal submission
(under review) about regular expression packages,
https://raw.githubusercontent.com/tdhock/namedCapture-article/master/RJwrapper.pdf
Comments/suggestions welcome.
On Thu, Aug 15, 2019 at 12:15 AM Cyclic Group Z_1 via R-devel <
r-devel at r-project.org> wrote:
> A very common use case for regmatches is to extract regex matches into a
> new column in a data.frame (or data.tab...
2019 Oct 29
0
stats::reshape quadratic in number of input columns
...as columns named
Species,Sepal,Petal,dimension (where part is either Length or Width). Of
course there is no performance issue with N=4 input columns in the original
iris data, but I made larger versions of this reshaping problem by making
copies of the input columns. The results
https://github.com/tdhock/nc-article#28-oct-2019 show that the quadratic
time complexity results in significant slowdowns after about N=10,000 input
columns to reshape. (e.g. several minutes for stats::reshape versus several
seconds for data.table::melt)
For a fix, I would suggest looking into how they implemented the same...
2023 Mar 30
0
read.csv quadratic time in number of columns
...that read.csv is quadratic time (N^2) in the number of columns N,
whereas the others are linear (N).
Can read.csv be improved to use a linear time algorithm, so it can handle
CSV files with larger numbers of columns?
For more details including figures and session info, please see
https://github.com/tdhock/atime/issues/8
Sincerely,
Toby Dylan Hocking
[[alternative HTML version deleted]]
2023 Dec 19
1
Partial matching performance in data frame rownames using [
Hi Hilmar and Ivan,
I have used your code examples to write a blog post about this topic,
which has figures that show the asymptotic time complexity of the
various approaches,
https://tdhock.github.io/blog/2023/df-partial-match/
The asymptotic complexity of partial matching appears to be quadratic
O(N^2) whereas the other approaches are asymptotically faster: linear
O(N) or log-linear O(N log N).
I think that accepting Ivan's pmatch.rows patch would add un-necessary
complexity to b...
2019 May 14
3
Pcre install
Hello,
I downloaded R-3.6.0.tar.gz from https://cran.r-project.org/src/base/R-3/.
I tried to install R-3.6.0.tar.gz in Ubuntu system.
Thanks in advance for any help!
yue
checking for pcre.h... yes
checking pcre/pcre.h usability... no
checking pcre/pcre.h presence... no
checking for pcre/pcre.h... no
checking if PCRE version >= 8.20, < 10.0 and has UTF-8 support... no
checking
2023 Dec 16
2
Partial matching performance in data frame rownames using [
On Wed, 13 Dec 2023 09:04:18 +0100
Hilmar Berger via R-devel <r-devel at r-project.org> wrote:
> Still, I feel that default partial matching cripples the functionality
> of data.frame for larger tables.
Changing the default now would require a long deprecation cycle to give
everyone who uses `[.data.frame` and relies on partial matching
(whether they know it or not) enough time to
2015 Sep 03
0
mclapply memory leak?
Toby,
> On Sep 2, 2015, at 1:12 PM, Toby Hocking <tdhock5 at gmail.com> wrote:
>
> Dear R-devel,
>
> I am running mclapply with many iterations over a function that modifies
> nothing and makes no copies of anything. It is taking up a lot of memory,
> so it seems to me like this is a bug. Should I post this to
> bugs.r-project.o...
2019 May 29
2
R pkg install should fail for unsuccessful DLL copy on windows?
...l_0.2.2 geometry_0.4.1
[17] tools_3.6.0 glue_1.3.1 purrr_0.3.2 munsell_0.5.0
[21] abind_1.4-7 compiler_3.6.0 pkgconfig_2.0.2 colorspace_1.4-1
[25] tidyselect_0.2.5 tibble_2.1.1
>
>
]0;MINGW64:/c/Users/th798/R
th798 at cmp2986 MINGW64 ~/R
$
related blog post: https://tdhock.github.io/blog/2019/windows-dll/
[[alternative HTML version deleted]]
2019 May 30
0
R pkg install should fail for unsuccessful DLL copy on windows?
Hi Toby,
AFAIK it has not been addressed in R. You can handle the problem on
your package side, see
https://github.com/Rdatatable/data.table/pull/3237
Regards,
Jan
On Thu, May 30, 2019 at 4:46 AM Toby Hocking <tdhock5 at gmail.com> wrote:
>
> Hi all,
>
> I am having an issue related to installing packages on windows with
> R-3.6.0. When installing a package that is in use, I expected R to stop
> with an error. However I am getting a warning that the DLL copy was not
> successful, but the...
2019 Aug 15
4
Feature request: non-dropping regmatches/strextract
A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignment is done without subsetting.
For consistency with other R functions and compatibility with this use