Displaying 20 results from an estimated 2000 matches similar to: "Bug: time complexity of substring is quadratic as string size and number of substrings increases"
2019 Feb 22
1
Bug: time complexity of substring is quadratic as string size and number of substrings increases
On 2/20/19 7:55 PM, Toby Hocking wrote:
> Update: I have observed that stringi::stri_sub is linear time complexity,
> and it computes the same thing as base::substring. figure
> https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png
> source:
> https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R
>
> To me this is a
2019 Feb 20
0
Bug: time complexity of substring is quadratic as string size and number of substrings increases
Update: I have observed that stringi::stri_sub is linear time complexity,
and it computes the same thing as base::substring. figure
https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png
source:
https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R
To me this is a clear indication of a bug in substring, but again it would
be nice to have
2019 Feb 19
1
patch for gregexpr(perl=TRUE)
Hi all,
Several people have noticed that gregexpr is very slow for large subject
strings when perl=TRUE is specified.
-
https://stackoverflow.com/questions/31216299/r-faster-gregexpr-for-very-large-strings
-
http://r.789695.n4.nabble.com/strsplit-perl-TRUE-gregexpr-perl-TRUE-very-slow-for-long-strings-td4727902.html
- https://stat.ethz.ch/pipermail/r-help/2008-October/178451.html
I figured out
2019 Aug 15
4
Feature request: non-dropping regmatches/strextract
A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignment is done without subsetting.
For consistency with other R functions and compatibility with this use
2020 Jun 27
1
Error in substring: invalid multibyte string
Thanks for the quick response Ivan. readLines with encoding='latin1' works
for me (on Ubuntu).
However I was more concerned with the inconsistency in results between
substr and regexpr. I was expecting that if one of them errors because of
an unknown encoding then the other should as well. Even better, if regexpr
works, why shouldn't substr work as well?
Incidentally the analogous
2019 Aug 29
0
Feature request: non-dropping regmatches/strextract
if you want "to extract regex matches into a new column in a data.frame"
then there are some package functions which do exactly that. three examples
are namedCapture::df_match_variable, rematch2::bind_re_match, and
tidyr::extract. For a more detailed discussion see my R journal submission
(under review) about regular expression packages,
2020 Jun 26
2
Error in substring: invalid multibyte string
Hi all,
I'm getting the following error from substring:
> substr("<I>Jens Oehlschl\xe4gel-Akiyoshi", 1, 100)
Error in substr("<I>Jens Oehlschl\xe4gel-Akiyoshi", 1, 100) :
invalid multibyte string at '<e4>gel-A<6b>iyoshi'
Is that normal / intended? I've tried setting the Encoding/locale to
Latin-1/UTF-8 but that does not help. nchar
2020 Jun 09
2
valgrind false positive on R startup?
Hi all,
I'm on Ubuntu 18.04, running R-4.0.0 which I compiled from source, and
using valgrind I am always seeing the following message. Does anybody
else see that? Is that a known false positive? Any ideas how to
fix/suppress? Seems related to TRE, do I need to upgrade that?
(base) tdhock at maude-MacBookPro:~/R/binsegRcpp$ R --vanilla -d valgrind
-e 'extSoftVersion()'
==9565==
2023 Mar 30
1
write.csv performance improvements?
Dear R-devel,
I did a systematic comparison of write.csv with similar functions, and
observed two asymptotic inefficiencies that could be improved.
1. write.csv is quadratic time (N^2) in the number of columns N.
Can write.csv be improved to use a linear time algorithm, so it can handle
CSV files with larger numbers of columns?
For more details including figures and session info, please see
2013 Sep 30
1
str_count counts the substring
I am trying to count the number of times a word occurs in a string.
and using str_count function from the package stringr. This function counts
the substrings as well.
Is there a way in which I can exclude the substring count and just take the
exact match.
Thanks in advance.
--
Thanks and Regards
Agrima Srivastava
-------------------------------------------------------------------------------
2019 Feb 23
0
Bug: time complexity of substring is quadratic
> From: Tomas Kalibera <tomas.kalibera at gmail.com>
>
> Thanks for the report, I am working on a patch that will address this.
>
> I confirm there is a lot of potential for speedup. On my system,
>
> 'N=200000; x <- substring(paste(rep("A", N), collapse=""), 1:N, 1:N)'
>
> spends 96% time in checking if the string is ascii and 3%
2006 May 17
1
install.packages bug (PR#8873)
Hello,
I've been using R for about 3 years now and I'm pretty sure this is a bug.
I'm using R 2.2.0.
The way R is set up to get packages from CRAN using install.packages is
really convenient --- if you are installing to your system's main package
directory. However, I observe the following problem:
I want package X but it requires package Y. Further, I have neither
package
how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?
2009 Dec 20
1
how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?
Last one for you guys:
The command:
length(gregexpr('cus','hocus pocus')[[1]])
[1] 2
returns the number of times the substring 'cus' appears in 'hocus pocus'
(which is two)
It's returning the number of **disjoint** matches. So:
length(gregexpr('aa','aaa')[[1]])
[1] 1
returns 1.
**What I want to do:**
I'm looking for a way to count
2019 Oct 29
0
stats::reshape quadratic in number of input columns
Hi R-core,
I have been performance testing R packages for wide-to-tall data reshaping
and for the most part I see they differ by constant factors.
However in one test, which involves converting into multiple output
columns, I see that stats::reshape is in fact quadratic in the number of
input columns. For example take the iris data, which has 4 input columns to
reshape, and the desired output
2023 Mar 30
0
read.csv quadratic time in number of columns
Dear R-devel,
A number of people have observed anecdotally that read.csv is slow for
large number of columns, for example:
https://stackoverflow.com/questions/7327851/read-csv-is-extremely-slow-in-reading-csv-files-with-large-numbers-of-columns
I did a systematic comparison of read.csv with similar functions, and
observed that read.csv is quadratic time (N^2) in the number of columns N,
whereas
2009 Jul 20
1
locate substring in the string it belong to
Hi R users,
I am trying generate the indices for locating a in the string it come from.
Given the length of the string, it will take too long using the combn() for
further comparison. I am wondering if R has any built-in function for this
purpose.
To make it concrete:
this.substring="cc"
this.string="ccc"
start.location=1,2
end.location=2,3
Thanks in advance,
Kevin
2010 Jul 07
3
use sliding window to count substrings found in large string
Hello together,
I'm looking for advice on how to do some tests on strings.
What I want to do is the following:
(just an example, real strings/sequence are about 200-400 characters long)
given set of Strings:
String1 abcdefgh
String2 bcdefgop
use a sliding window of size x to create an vector of all subsequences
of size x
found in the set (order matters! ).
Now create, for every string
2015 Sep 02
4
mclapply memory leak?
Dear R-devel,
I am running mclapply with many iterations over a function that modifies
nothing and makes no copies of anything. It is taking up a lot of memory,
so it seems to me like this is a bug. Should I post this to
bugs.r-project.org?
A minimal reproducible example can be obtained by first starting a memory
monitoring program such as htop, and then executing the following code
while
2009 Sep 25
7
Spliting columns, strings or reg exp returning substrings
Currently as the first column in a data frame I have string values in the format xx_yy - I want to create a new column with just the substring xx (for each row in turn). Three possible ways to do this might be (1) split the string by '_' using strsplit and paste the first of the resulting variables into a new column, but I have been unable to do this for each row of my data frame in turn
2011 Apr 11
1
Getting many substrings but only loading the original string one time.
Hi All,
I'm looking for a way to get many substrings from a longer string and
then stitch them together. But, since the longer string is really, really
long (like 250 MB long), I don't want to do this in a loop and load and
re-load the longer string many times. Does anybody have an idea?
Maybe I could pass in two vectors (the first would have the starting
coordinates, and the second