Displaying 3 results from an estimated 3 matches for "char_col".
Did you mean:
char_cols
2024 Apr 10
2
Exceptional slowness with read.csv
...n a two column data.frame with columns
Col - the column processed;
Unbalanced - the rows with unbalanced double quotes.
I am assuming the quotes are double quotes. It shouldn't be difficult to
adapt it to other cas, single quotes, both cases.
unbalanced_dquotes <- function(x) {
char_cols <- sapply(x, is.character) |> which()
lapply(char_cols, \(i) {
y <- x[[i]]
Unbalanced <- gregexpr('"', y) |>
sapply(\(x) attr(x, "match.length") |> length()) |>
{\(x) (x %% 2L) == 1L}() |>
which()
data.frame(Col...
2024 Apr 10
1
Exceptional slowness with read.csv
...ol - the column processed;
> ?Unbalanced - the rows with unbalanced double quotes.
>
> I am assuming the quotes are double quotes. It shouldn't be difficult
> to adapt it to other cas, single quotes, both cases.
>
>
>
>
> unbalanced_dquotes <- function(x) {
> ? char_cols <- sapply(x, is.character) |> which()
> ? lapply(char_cols, \(i) {
> ??? y <- x[[i]]
> ??? Unbalanced <- gregexpr('"', y) |>
> ????? sapply(\(x) attr(x, "match.length") |> length()) |>
> ????? {\(x) (x %% 2L) == 1L}() |>
> ????? whic...
2024 Apr 08
4
Exceptional slowness with read.csv
Greetings,
I have a csv file of 76 fields and about 4 million records. I know that
some of the records have errors - unmatched quotes, specifically.?
Reading the file with readLines and parsing the lines with read.csv(text
= ...) is really slow. I know that the first 2459465 records are good.
So I try this:
> startTime <- Sys.time()
> first_records <- read.csv(file_name, nrows