thr3ads.net - search: "unbalanced

Displaying 3 results from an estimated 3 matches for "unbalanced_dquot".

2024 Apr 10

Exceptional slowness with read.csv

...r columns, then transforming the output in a two column data.frame with columns Col - the column processed; Unbalanced - the rows with unbalanced double quotes. I am assuming the quotes are double quotes. It shouldn't be difficult to adapt it to other cas, single quotes, both cases. unbalanced_dquotes <- function(x) { char_cols <- sapply(x, is.character) |> which() lapply(char_cols, \(i) { y <- x[[i]] Unbalanced <- gregexpr('"', y) |> sapply(\(x) attr(x, "match.length") |> length()) |> {\(x) (x %% 2L) == 1L}() |>...

Exceptional slowness with read.csv

2024 Apr 10

Exceptional slowness with read.csv

...wo column data.frame with columns > > ?Col - the column processed; > ?Unbalanced - the rows with unbalanced double quotes. > > I am assuming the quotes are double quotes. It shouldn't be difficult > to adapt it to other cas, single quotes, both cases. > > > > > unbalanced_dquotes <- function(x) { > ? char_cols <- sapply(x, is.character) |> which() > ? lapply(char_cols, \(i) { > ??? y <- x[[i]] > ??? Unbalanced <- gregexpr('"', y) |> > ????? sapply(\(x) attr(x, "match.length") |> length()) |> > ????? {\(x) (x...

Exceptional slowness with read.csv

2024 Apr 08

Exceptional slowness with read.csv

Greetings, I have a csv file of 76 fields and about 4 million records. I know that some of the records have errors - unmatched quotes, specifically.? Reading the file with readLines and parsing the lines with read.csv(text = ...) is really slow. I know that the first 2459465 records are good. So I try this: > startTime <- Sys.time() > first_records <- read.csv(file_name, nrows

search for: unbalanced_dquot