similar to: Bug: time complexity of substring is quadratic as string size and number of substrings increases

Displaying 20 results from an estimated 2000 matches similar to: "Bug: time complexity of substring is quadratic as string size and number of substrings increases"

2019 Feb 22
1
Bug: time complexity of substring is quadratic as string size and number of substrings increases
On 2/20/19 7:55 PM, Toby Hocking wrote: > Update: I have observed that stringi::stri_sub is linear time complexity, > and it computes the same thing as base::substring. figure > https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png > source: > https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R > > To me this is a
2019 Feb 20
0
Bug: time complexity of substring is quadratic as string size and number of substrings increases
Update: I have observed that stringi::stri_sub is linear time complexity, and it computes the same thing as base::substring. figure https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png source: https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R To me this is a clear indication of a bug in substring, but again it would be nice to have
2019 Feb 19
1
patch for gregexpr(perl=TRUE)
Hi all, Several people have noticed that gregexpr is very slow for large subject strings when perl=TRUE is specified. - https://stackoverflow.com/questions/31216299/r-faster-gregexpr-for-very-large-strings - http://r.789695.n4.nabble.com/strsplit-perl-TRUE-gregexpr-perl-TRUE-very-slow-for-long-strings-td4727902.html - https://stat.ethz.ch/pipermail/r-help/2008-October/178451.html I figured out
2019 Aug 15
4
Feature request: non-dropping regmatches/strextract
A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignment is done without subsetting. For consistency with other R functions and compatibility with this use
2020 Jun 27
1
Error in substring: invalid multibyte string
Thanks for the quick response Ivan. readLines with encoding='latin1' works for me (on Ubuntu). However I was more concerned with the inconsistency in results between substr and regexpr. I was expecting that if one of them errors because of an unknown encoding then the other should as well. Even better, if regexpr works, why shouldn't substr work as well? Incidentally the analogous
2019 Aug 29
0
Feature request: non-dropping regmatches/strextract
if you want "to extract regex matches into a new column in a data.frame" then there are some package functions which do exactly that. three examples are namedCapture::df_match_variable, rematch2::bind_re_match, and tidyr::extract. For a more detailed discussion see my R journal submission (under review) about regular expression packages,
2020 Jun 26
2
Error in substring: invalid multibyte string
Hi all, I'm getting the following error from substring: > substr("<I>Jens Oehlschl\xe4gel-Akiyoshi", 1, 100) Error in substr("<I>Jens Oehlschl\xe4gel-Akiyoshi", 1, 100) : invalid multibyte string at '<e4>gel-A<6b>iyoshi' Is that normal / intended? I've tried setting the Encoding/locale to Latin-1/UTF-8 but that does not help. nchar
2020 Jun 09
2
valgrind false positive on R startup?
Hi all, I'm on Ubuntu 18.04, running R-4.0.0 which I compiled from source, and using valgrind I am always seeing the following message. Does anybody else see that? Is that a known false positive? Any ideas how to fix/suppress? Seems related to TRE, do I need to upgrade that? (base) tdhock at maude-MacBookPro:~/R/binsegRcpp$ R --vanilla -d valgrind -e 'extSoftVersion()' ==9565==
2023 Mar 30
1
write.csv performance improvements?
Dear R-devel, I did a systematic comparison of write.csv with similar functions, and observed two asymptotic inefficiencies that could be improved. 1. write.csv is quadratic time (N^2) in the number of columns N. Can write.csv be improved to use a linear time algorithm, so it can handle CSV files with larger numbers of columns? For more details including figures and session info, please see
2013 Sep 30
1
str_count counts the substring
I am trying to count the number of times a word occurs in a string. and using str_count function from the package stringr. This function counts the substrings as well. Is there a way in which I can exclude the substring count and just take the exact match. Thanks in advance. -- Thanks and Regards Agrima Srivastava -------------------------------------------------------------------------------
2019 Feb 23
0
Bug: time complexity of substring is quadratic
> From: Tomas Kalibera <tomas.kalibera at gmail.com> > > Thanks for the report, I am working on a patch that will address this. > > I confirm there is a lot of potential for speedup. On my system, > > 'N=200000; x <- substring(paste(rep("A", N), collapse=""), 1:N, 1:N)' > > spends 96% time in checking if the string is ascii and 3%
2006 May 17
1
install.packages bug (PR#8873)
Hello, I've been using R for about 3 years now and I'm pretty sure this is a bug. I'm using R 2.2.0. The way R is set up to get packages from CRAN using install.packages is really convenient --- if you are installing to your system's main package directory. However, I observe the following problem: I want package X but it requires package Y. Further, I have neither package
2009 Dec 20
1
how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?
Last one for you guys: The command: length(gregexpr('cus','hocus pocus')[[1]]) [1] 2 returns the number of times the substring 'cus' appears in 'hocus pocus' (which is two) It's returning the number of **disjoint** matches. So: length(gregexpr('aa','aaa')[[1]]) [1] 1 returns 1. **What I want to do:** I'm looking for a way to count
2019 Oct 29
0
stats::reshape quadratic in number of input columns
Hi R-core, I have been performance testing R packages for wide-to-tall data reshaping and for the most part I see they differ by constant factors. However in one test, which involves converting into multiple output columns, I see that stats::reshape is in fact quadratic in the number of input columns. For example take the iris data, which has 4 input columns to reshape, and the desired output
2023 Mar 30
0
read.csv quadratic time in number of columns
Dear R-devel, A number of people have observed anecdotally that read.csv is slow for large number of columns, for example: https://stackoverflow.com/questions/7327851/read-csv-is-extremely-slow-in-reading-csv-files-with-large-numbers-of-columns I did a systematic comparison of read.csv with similar functions, and observed that read.csv is quadratic time (N^2) in the number of columns N, whereas
2009 Jul 20
1
locate substring in the string it belong to
Hi R users, I am trying generate the indices for locating a in the string it come from. Given the length of the string, it will take too long using the combn() for further comparison. I am wondering if R has any built-in function for this purpose. To make it concrete: this.substring="cc" this.string="ccc" start.location=1,2 end.location=2,3 Thanks in advance, Kevin
2010 Jul 07
3
use sliding window to count substrings found in large string
Hello together, I'm looking for advice on how to do some tests on strings. What I want to do is the following: (just an example, real strings/sequence are about 200-400 characters long) given set of Strings: String1 abcdefgh String2 bcdefgop use a sliding window of size x to create an vector of all subsequences of size x found in the set (order matters! ). Now create, for every string
2015 Sep 02
4
mclapply memory leak?
Dear R-devel, I am running mclapply with many iterations over a function that modifies nothing and makes no copies of anything. It is taking up a lot of memory, so it seems to me like this is a bug. Should I post this to bugs.r-project.org? A minimal reproducible example can be obtained by first starting a memory monitoring program such as htop, and then executing the following code while
2009 Sep 25
7
Spliting columns, strings or reg exp returning substrings
Currently as the first column in a data frame I have string values in the format xx_yy - I want to create a new column with just the substring xx (for each row in turn). Three possible ways to do this might be (1) split the string by '_' using strsplit and paste the first of the resulting variables into a new column, but I have been unable to do this for each row of my data frame in turn
2011 Apr 11
1
Getting many substrings but only loading the original string one time.
Hi All, I'm looking for a way to get many substrings from a longer string and then stitch them together. But, since the longer string is really, really long (like 250 MB long), I don't want to do this in a loop and load and re-load the longer string many times. Does anybody have an idea? Maybe I could pass in two vectors (the first would have the starting coordinates, and the second